[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: A formal specification for CIF version 1.1 (Draft)
- Subject: RE: A formal specification for CIF version 1.1 (Draft)
- From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
- Date: Thu, 11 Jul 2002 00:08:47 +0100 (BST)
As a follow on to my remarks about my perceived differences between versions 1.0 and 1.1 of CIF, here are an additional specific response to Brian's message and some further commentary on the specification. Brian McMahon [mailto:bm@iucr.org] wrote: [...] > CIF is intended as an archival and portable format. For this > reason, the > description of certain syntactic features has been > constructed with care to > try to avoid machine or operating-system dependencies. This > is particularly > the case with the discussion regarding end-of-line delimiters. Here an > attempt has been made to reconcile the practical handling of > files which are > transported or shared across common operating systems such as > Unix, MacOS > and MSWindows with the more general formulation that is > required to support > files on mainframe or elderly record-oriented OS architectures. Regardless of whether the end of line handling is different in 1.1 than it was in 1.0, I think that those comments are a mischaracterization of the details of the draft 1.1 spec. As far as I can tell, what the spec now says is that CIF line termination is in fact machine dependent, and that an external utility must -- must! -- be used to convert a CIF from any foreign machine line termination convention to the local machine convention (if they differ) before a conforming CIF parser can successfully parse the file. I think this is exactly the wrong direction. I think it unfortunate that the specification lumps together CIF dictionaries and CIF data files as CIF, considering that they are in fact slightly different STAR dialects. It furthermore seems like the spec has been tailored to allow this combination (by addition of save frames, at least), which I find a questionable strategy -- especially given that it did not really accomplish the apparent goal anyway (that apparent goal being to produce a single STAR dialect with which both the dictionaries and the data files could be expressed). I do not see any point whatsoever to adding the stop_ keyword to the accepted CIF syntax. It is not necessary as long as CIF does not permit nested loops, so it only makes parsers more difficult to write. The question should be "why add it?" rather than "why not?" What exactly is the point of introducing the square bracket delimiters for text values? This is a bit picky, but I don't see the point of introducing distinct productions for <Comments> and <TokenizedComments> in the formal grammar. Why not just forget about what is currently called <Comments>, use <TokenizedComments> in its place, and adjust the description to match? That also relieves the spec of having to note the exception that a '#' embedded in non-whitespace does not initiate a comment. More fundamentally, though, why express a production for comments (plural) without expressing one for a single comment? That's a bit quirky, I think, although it appears to work. The <NameChar> non-terminal is not used anywhere. And if it were used as the accompanying text describes (section 51), it would be another break from CIF 1.0 because quotation characters are excluded from data block names and data tags. Those are not excluded in STAR (at least not in the 1994 paper) and as far as I know that exclusion was never before expressed as a CIF restriction of STAR. Also in the formal grammar, the productions for <SingleQuotedString><WhiteSpace> and <DoubleQuotedString><WhiteSpace> are ambiguous. What is intended, I think, is that the shortest string that matches the production be used. Here is an example that could be misinterpreted: _d1 'a character value' _d2 'another one' Is that one data item or two? Either interpretation seems to match the production. I think Nick had some more precise productions in the BNFs he floated. In section 59, no production is given for <UnsignedInteger>. A production can be found in the Appendix A summary, but it should be duplicated here as the other related productions are. Similarly for the <Exponent> non-terminal. In section 59, the production for <Float> matches "+1.0" but not "1.0". Moreover, is there any value to including the <Numeric> non-terminal and its children in the grammar at all? Anything that matches <Numeric> will also match <CharString>, so <Numeric> is not necessary to describe the language. John Bollinger jobollin@indiana.edu
Reply to: [list | sender only]
- Prev by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Next by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Prev by thread: RE: A formal specification for CIF version 1.1 (Draft)
- Next by thread: RE: A formal specification for CIF version 1.1 (Draft)
- Index(es):