[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Problems with CIF BNF
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Problems with CIF BNF
- From: Joe Krahn <krahn@niehs.nih.gov>
- Date: Mon, 12 Mar 2007 16:30:37 -0400
- In-Reply-To: <a06230902c21b5216f07d@[192.168.10.211]>
- References: <45F5918A.4030602@niehs.nih.gov><a06230902c21b5216f07d@[192.168.10.211]>
I realize that there are a few hacks in the BNF to deal with context-dependence, like productions defined as multiple symbols, which make it impossible to use as a working BNF. But, there are other problems with grammar. With the end-of-line example, the lexer can do something 'sensible', but it is still important to have a specific definition of whether missing a terminal <eol> makes the CIF invalid. I can look at CBFlib to see an interpretation of the CIF grammar, but someone else's parser may have a different interpretation. In fact, it would be good to have a collection of unusual CIF files for parser testing, with a consensus as to which ones are valid and which are invalid. Joe Herbert J. Bernstein wrote: > Without a defined lexer, you cannot do CIF as a BNF; it is context > sensitive in its use of whitespace. The question you are raising > about EOF should be handled by the lexer, which should deal sensibly > with the usual unix problem of disambiguating the case of a final > line that ends with eof rather than eol-eof. There is a rather > complete bison grammar in CBFlib working on the level of tokens > after lexing the input. -- HJB > > > At 1:44 PM -0400 3/12/07, Joe Krahn wrote: >> Some parts of CIF are vague. I hoped that the BNF syntax would be a >> precise syntax specification, but it has problems. It is central to >> properly defining the CIF format, and should therefore be very accurate. >> >> First, there are some plain syntax errors, like unbalanced braces in the >> production of <Float>, and an empty token in the TokenizedComments >> production. >> >> There are also a few hacks like <noteol>, and the lack of rules for the >> content of quoted strings. I think it is also a hack for a production >> unit to be defined for two elements, like "<eol><UnquotedString>". >> >> Does EOF count as whitespace? Normally, a text file ends with an <eol> >> on the last line, so it is not a problem. With Fortran, you may not be >> able to distinguish between them, so it seems that EOF probably should >> count as a whitespace token. >> >> There are also places where the grammar could be simplified, such as: >> >> { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger> >> >> written as: >> {'e' | 'E' } { '+' | '-' }? <UnsignedInteger> >> >> Also note the error in the first form copied from the web page: the >> minus sign has a space included. >> >> Should the logical-OR symbol always be contained within braces? This >> appears to be inconsistent, but maybe the rule is to require braces when >> the members include a quoted character element. >> >> I will try to edit my own version of the BNF to produce what I think it >> is supposed to mean. Answers to some of the above questions will be >> helpful in getting it right. >> >> Thanks, >> Joe Krahn >> _______________________________________________ >> comcifs mailing list >> comcifs@iucr.org >> http://scripts.iucr.org/mailman/listinfo/comcifs > > _______________________________________________ > comcifs mailing list > comcifs@iucr.org > http://scripts.iucr.org/mailman/listinfo/comcifs
Reply to: [list | sender only]
- Follow-Ups:
- Re: Problems with CIF BNF (Herbert J. Bernstein)
- References:
- Problems with CIF BNF (Joe Krahn)
- Re: Problems with CIF BNF (Herbert J. Bernstein)
- Prev by Date: COMCIFS Annual Report for 2006 (draft)
- Next by Date: Re: COMCIFS Annual Report for 2006 (draft)
- Prev by thread: Re: Problems with CIF BNF
- Next by thread: Re: Problems with CIF BNF
- Index(es):