[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Problems with CIF BNF
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Problems with CIF BNF
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Mon, 12 Mar 2007 15:00:47 -0400
- In-Reply-To: <45F5918A.4030602@niehs.nih.gov>
- References: <45F5918A.4030602@niehs.nih.gov>
Without a defined lexer, you cannot do CIF as a BNF; it is context sensitive in its use of whitespace. The question you are raising about EOF should be handled by the lexer, which should deal sensibly with the usual unix problem of disambiguating the case of a final line that ends with eof rather than eol-eof. There is a rather complete bison grammar in CBFlib working on the level of tokens after lexing the input. -- HJB At 1:44 PM -0400 3/12/07, Joe Krahn wrote: >Some parts of CIF are vague. I hoped that the BNF syntax would be a >precise syntax specification, but it has problems. It is central to >properly defining the CIF format, and should therefore be very accurate. > >First, there are some plain syntax errors, like unbalanced braces in the >production of <Float>, and an empty token in the TokenizedComments >production. > >There are also a few hacks like <noteol>, and the lack of rules for the >content of quoted strings. I think it is also a hack for a production >unit to be defined for two elements, like "<eol><UnquotedString>". > >Does EOF count as whitespace? Normally, a text file ends with an <eol> >on the last line, so it is not a problem. With Fortran, you may not be >able to distinguish between them, so it seems that EOF probably should >count as a whitespace token. > >There are also places where the grammar could be simplified, such as: > > { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger> > >written as: > {'e' | 'E' } { '+' | '-' }? <UnsignedInteger> > >Also note the error in the first form copied from the web page: the >minus sign has a space included. > >Should the logical-OR symbol always be contained within braces? This >appears to be inconsistent, but maybe the rule is to require braces when >the members include a quoted character element. > >I will try to edit my own version of the BNF to produce what I think it >is supposed to mean. Answers to some of the above questions will be >helpful in getting it right. > >Thanks, >Joe Krahn >_______________________________________________ >comcifs mailing list >comcifs@iucr.org >http://scripts.iucr.org/mailman/listinfo/comcifs
Reply to: [list | sender only]
- Follow-Ups:
- Re: Problems with CIF BNF (Joe Krahn)
- References:
- Problems with CIF BNF (Joe Krahn)
- Prev by Date: Problems with CIF BNF
- Next by Date: COMCIFS Annual Report for 2006 (draft)
- Prev by thread: Problems with CIF BNF
- Next by thread: Re: Problems with CIF BNF
- Index(es):