[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: A formal specification for CIF version 1.1 (Draft)
- Subject: RE: A formal specification for CIF version 1.1 (Draft)
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 10 Jul 2002 22:10:01 +0100 (BST)
It is very helpful to receive such detailed comments. I will provide my take on what has been said, but please remember that I am speaking only for myself. Others involved in working on the draft may have very different takes on these items. -- H. J. Bernstein ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 020 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 10 Jul 2002, Bollinger, John Clayton wrote: > > Brian McMahon [mailto:bm@iucr.org] wrote: > > [...] > > > One point should be made carefully: this specification is for > > an extended > > version of CIF, not yet formally adopted by COMCIFS. The only > > significant > > extensions to the existing standard are: restriction of the > > line-length > > constraint from 80 to 2048 characters, and the introduction > > of matching > > square brackets as additional delimiters for string values > > containing white > > space. > > I think there are quite a few other differences, and no small number > of them incompatibilities. Many of the incompatibilities are corner > cases, but there are some more important ones. > > Here are the differences I detected on my read through the syntax > description: > > How about that the formally reserved but unused stop_ and save_ > keywords are now used in CIF 1.1, albeit the latter only in > dictionaries. And speaking of dictionaries, that they are now > written in CIF rather than in their own STAR dialect. (Well, > really they're still a slightly different dialect in that only > they can have save frames, but the draft spec says they are CIFs.) I believe that this use of stop_ and save_ does not invalidate any previously valid CIFs, and is a realistic approach to dealing with these reserved words. Any validating CIF parser needs to have a module to read dictionaries, where it will encounter save frames. Any properly written CIF parser has to recognize stop_ to distinguish it from a data value. By making these changes in the specification, we are specifying a common practice (save frames), and saying that a use of a reserved word (stop_) in a context in which it clearly is not an error, should not be treated as an error. > > And what about data values beginning with a substring matching a > reserved word? (Paragraph 10) In CIF 1.0 it was reasonably clear > that something like this applied to data_ because such a construct > had its own semantics defined, but it was not clear that this was > a general restriction applied to all the reserved words. Did I > just miss it somewhere, or is this one of those points of 1.0 that > is being clarified via the 1.1 spec? If the latter, then let me > throw in that I don't like it. I think that's because it is a > departure from the normal sense of the term "reserved word." In any > case, it makes a parser that incremental bit trickier to write. CIF has always been presented as an application of STAR, so the reserved words have, in fact always been reserved, and it has always been the case the having a data value beginning data_ or save_ was incorrect. By applying exactly the same logic to the full set of reserved words, I believe we should make the design of most parsers cleaner and simpler. > > In paragraph 17: "The end-of-line associated with the closing semicolon > does not form part of the data value." Is this another > change/clarification, or another published detail that had previously > escaped me? I had thought that that last eol was part of the value. > If you exclude the terminal <eol> from the text field, you then allow the semi-colon to quote arbitrary text fields, including those that do not have a terminal semicolon. If you do not exclude the terminal <eol> from the text fields, then the only text that can be quoted with semicolons is text that ends with a semicolon. > In paragraphs 22 and 41: Exclusion of ASCII characters 11 and 12 > decimal is a departure from and incompatibility with CIF 1.0. Not > that I particularly object -- handling these appropriately is a pain. > The second sentence of the abstract of the Hall, Allen, Brown paper says: "The CIF is a general, flexible and easily extensible free-format archive file; it is human and machine readable and can be edited by a simple text editor." It is not always possible to edit texts containing ASCII control characters other than HT with a "simple text editor". VT and FF serve to useful purpose in a CIF, and, as you note, they can be a pain to handle. > In paragraph 29: the data name length restriction to 75 characters is > another incompatibility with CIF 1.0 (as revised) where the data name > length was restricted only indirectly by the line length restriction. > Thus in CIF 1.0 data names could be 80 characters long. > Actually, to allow a data name to be defined in a dictionary you have to allow it to appear with a prepended "data_" or "save_". In DDL1 dictionaries, the leading underscore of the data name is dropped, which has created a limit of 76 characters. In DDL2 the underscore is retained, which has create a limit of 75 characters. Thus the 75 character limit is simply a recognition of the implicit line length restrictions that had been in effect in the past, and helps to ensure that old systems will be able to work with these new names. > Paragraph 42 makes it optional to support line termination semantics > different from the host OS'. That would be another departure > from CIF 1.0, I think, and, in my opinion, an all-around bad idea if > CIFs are supposed to be portable. As far as I can tell, the pseudo- > production presented for <eol> is in fact the required implementation > for a fully-conformant CIF 1.0 parser. > If you are on a unix system, the pseudo-production is almost right for a "liberal-reader" CIF parser. It misses the case of a final line in a file which has not been terminated by "\n". If you are on a VMS system, or an IBM mainframe, the pseudo-production may be completely wrong for a CIF created locally as a text file. If CIFs are truly to be portable, it must be possible for someone on a non-Unix system (and non-Windows, non-Mac system) to work with them. > Paragraph 43: In combination with the formal grammar presented earlier, > the definitions of the <eol> and <noteol> non-terminals in fact seems > to _preclude_ CIF parsers from handling non-native line termination > semantics. Even if that's not a departure from CIF 1.0, it's still > a bad idea. > We are not trying to preclude people from writing parsers which are liberal and able to read a wider range of CIF formats than those produced by the text editors of their own machines, but it would be unreasonable and impractical to insist that every parser be able to read every line format that ever has or will be invented. It is not even reasonable to insist that every parser be able to read some short list of non-native line formats. That would, for example, make Fortran-implemented parsers non-conformant on certain systems. > According to paragraph 60, a file containing only whitespace and > comments but no data block is not a valid 1.1 CIF. That is another > departure from CIF 1.0 if it is really the intent. One of the ciftest > trip files actually tests this case, in fact. This sounds like a good topic for further discussion. I for one would favor allowing such a file to be a CIF, but I am not certain what I would do with it. > > Paragraph 61: this is another departure from CIF 1.0, which did allow > data blocks without data items. Another of the ciftest trip files > tests this case. (vcif evidently produces a warning, which seems > reasonable, but this is not an error.) Yet another good topic for discussion. > > Regards, > > John Bollinger > jobollin@indiana.edu >
Reply to: [list | sender only]
- Prev by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Next by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Prev by thread: RE: A formal specification for CIF version 1.1 (Draft)
- Next by thread: RE: A formal specification for CIF version 1.1 (Draft)
- Index(es):