[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Request for approval of CIF version 1.1 specification
- To: Multiple recipients of list <comcifs-l@iucr.org>
- Subject: Re: Request for approval of CIF version 1.1 specification
- From: "Ralf W. Grosse-Kunstleve" <rwgk@yahoo.com>
- Date: Wed, 16 Oct 2002 02:27:09 +0100 (BST)
Attached are some comments regarding the 1.1 CIF specification as posted by Brian. I am wondering if it wouldn't make sense to officially announce these documents as "Working Specification" with the expressed intend to adopt them as "final" after being publicly available and actively used by implementors for a while (e.g. a year). Hopefully the community will see this as an invitation to participate in ironing out any kinks. Ralf ---------------------------------------------------------------------------- Version 1.1 Specification I have to admit that am still not entirely clear what the authoritative Version 1.0 Specification is (Hall, Allen & Brown, 1991?). It would be useful to clearly explain this in the introduction ("Revision history"). It would also be useful to outline the boundaries between this specification and the DDL specifications ("Scope"). ---------------------------------------------------------------------------- Syntax: Definition of terms: consolidate in one place (link). Regarding quoting rules: I am asking myself how to deal with a string like ; contains both isolated ' and " and ends with a ' ; If I understand correctly, anything can be handled in a multi-line text field. However, take the viewpoint of someone implementing a CIF writer. If the goal is to make the output human-readable, one would probably prefer quoted strings over multi-line text, in particular inside a loop_ construct. But then it seems necessary to pre-scan the text fields to be output to determine what kind of quoting is applicable. I am under the impression that it could be quite hard to devise an algorithm that generates both correct and "nice" output. A human working on a CIF will face similar difficulties. Isn't this an issue in practice? Could it be useful to include some practical quoting guidelines? I believe many people will expect constructs like 'an embedded \' quote' or "an embedded \" double quote" to work as they do in many programming languages. To avoid this common misunderstanding it will be useful to provide a link to the "Accented letters" table in the semantics document. 17. ... trailing white space on a line may however be elided. ^^^ In my opinion the specification should be unambiguous: White space should not be elided by the parser. The data value should be left untouched. Eliding is in the regime of semantics, not syntax. 20. ... By contrast the value of the text field ; foo bar ; is `foo\n bar' ... Should this be `foo\n bar' (two spaces before the bar)? Also, in the semantics document the notation <eol> is used instead of \n. I suggest using <eol> everywhere. 22.: The ASCII characters at decimal positions 11 (VT or vertical tab) and 12 (FF or form feed), often included in library implementations as white space characters, are explicitly excluded from the CIF character set at this revision. Points: 1. I don't see the benefit of explicitly excluding these characters. In practice it means that parsing of old files might fail only because these characters are embedded. I know there was some discussion already, but I cannot remember the details. Is there something wrong with the following, more forgiving approach: Unquoted VT and FF are treated as white space, quoted VT and FF are "passed through" like any other character: WhiteSpace> := { <SP> | <HT> | <VT> | <FF> | <eol> | <TokenizedComments>}+ <AnyPrintChar> := <OrdinaryChar> | <double_quote> | '#' | '$' | <single_quote> | '_' | <SP> | <HT> | <VT> | <FF> | ';' | '[' | ']' 2. If it is decided to explicitly exclude VT and FF this deviation from STAR should (also) be listed under "Implementation restrictions." 27. How does the "Maximum line length" apply to <eol>\; quoted strings as explained in the semantics document? For example, is the following legal? ;\ 2000 characters ...\ 2000 characters ... ; Finally, in the post-Fortran and post-C era line length restrictions seem very arbitrary and are ultimately a nuisance. I'd rather see this restriction removed from the specification. Programs written in languages without automatic dynamic memory management could simply allocate a large buffer (e.g. 128k are perfectly reasonable these days) and report an "Technical limitation" in the highly unlikely event that the buffer is insufficient. ---------------------------------------------------------------------------- Semantic: This sentence in the introduction leaves me puzzled: As computer techniques evolve, it becomes more appropriate to discuss the machine-accessible semantic content, or "meaning", of the data in such a file. Again: Definition of terms: consolidate in one place (link). 10. The character string [local] is reserved for local use. ^ ^ Is this [notation] used somewhere else? Are there alternatives? Handling of long lines - I am a bit surprised that this is presented in the semantic features document rather than the syntax document. - Why do we need this for # comments? Typographic style codes I don't see how these comments could make a significant difference in practice, but they significantly contribute to conveying the impression that the semantics features are a bit of a hodgepodge. I suggest deleting the entire "Typographic style codes" section. __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com
- Prev by Date: report on the open meeting at Geneva
- Next by Date: Re: Request for approval of CIF version 1.1 specification
- Prev by thread: Request for approval of CIF version 1.1 specification
- Next by thread: Re: Request for approval of CIF version 1.1 specification
- Index(es):