[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Backus-Naur Form for CIF
- To: Multiple recipients of list <comcifs-l@iucr.org>
- Subject: Re: Backus-Naur Form for CIF
- From: Nick Spadaccini <nick@cs.uwa.edu.au>
- Date: Wed, 4 Oct 2000 03:08:18 +0100 (BST)
On Tue, 3 Oct 2000, Herbert J. Bernstein wrote: > The usual approach used in Fortran of redefining a-z with productions of > the form a ::= "A"|"a" won't work here, since we need to preserve case > sensitivity for text. In practice this would be fudged in the lexical > scanner, but, for clarity, I would suggest adding an explicit comment > explaining the case-insensitivity of data names and some productions of > the form: > > <DATA_> ::= {"D"|"d"} {"A"|"a"} {"T"|"t"} {"A"|"a"} "_" > <LOOP_> ::= {"L"|"l"} {"O"|"o"} {"O"|"o"} {"P"|"p"} "_" > > to use in place of the "data_" and "loop_" strings Absolutely. This is very much what is done in the yacc implementation for starbase. Namely we redefine the characters a,b,d etc to be of either case and then define the tokens using these, as in .... a [aA] b [bB] d [dD] e [eE] g [gG] l [lL] o [oO] p [pP] s [sS] t [tT] v [vV] Data_ {d}{a}{t}{a}_ Loop_ {l}{o}{o}{p}_ Global_ {g}{l}{o}{b}{a}{l}_ Stop_ {s}{t}{o}{p}_ Save_ {s}{a}{v}{e}_ In the javacc implementation there is this wonderful global setting, namely options { IGNORE_CASE=true; } which simplifies things even more. I will adjust the BNF accordingly. Thanks for picking that up Herb. > 2. The production for <data_block> does not require any leading or > trailing whitespace, so that a <CIF_file> could consist of a > <data_heading> and a <data> item immediately followed by another > <data_heading>, etc. I cannot seem to find where the productions > explicitly require whitespace between the data item and the second > data heading. A similar problem seems to exist in the production for > loop values. This would certainly be solved by implicit precedence > among the productions or by operation of the lexical scanner, but it would > best to have the BNF be unambiguous in the handling of whitespace. I have said it before and I will say it again, "Now you know why I have been reluctant to include productions specific to whitespace into the BNF". They are a purely lexical issue and language BNFs all exclude them with the proviso that " whitespace can be used anywhere to delimit tokens etc etc" without any explicit rules. I can see a fix, but it would need an exception. Namely change <data_block> ::= <wspace>* <data_heading> <data>+ <wspace>* to <data_block> ::= <wspace>+ <data_heading> <data>+ <wspace>* The exception being the leading <wspace> need not be there IF IT IS THE BEGINNING OF THE FILE. You could equally have <data_block> ::= <wspace>* <data_heading> <data>+ <wspace>+ with the exception about the end of the file. This exception would have to be "written as a comment" and not formally part of the BNF syntax (unless someone can see how to do it elegantly). What's the consensus? > 3. The paper speaks of blanks, but not of tabs and vertical tabs and > formfeeds. Most systems will accept handle tabs reasonably. Not all > systems can handle vertical tab or form feed. Are we requiring all > CIF parsers to be able to handle more than blank and tab? The vt and ff was an attempt to catch other non-printing characters that could be reasonably interpreted as the equivalent of spaces or tabs (the vt) or of a newline (ff). If it clarifies things, and restrictions always do, I can delete references to vt and ff. Opinions? > 4. The paper speaks of recognising a number, and gives a syntax for a > number (with and without an ESD). Shouldn't this be in the BNF? I guess I really view the BNF down to the level of what is a data value in terms of the allowed character sequence. Whether it is a number or not is a higher level of abstraction. I can include the production for a number (with or without parentheses) but it would be a lexical definition. That is it would not appear in any of the grammar productions because the complexity would grow enormously. Imagine having to now define when a <number> can appear within an <SC_bounded_string>! A *number* can be included for the sake of lexical definition. Opinions? > 5. The paper includes an example with use of "\" (e.g. 'Cu K\a' escapes > in text and character fields. Shouldn't this escape mechanism be > mentioned in the BNF, at least in the comments. As far as the BNF is concerned the use of \ is not excluded as a legitimate character. > 6. The BNF does not seem to break out the "." and "?" metacharacter data > values. In real parsers, these are very important cases to distinguish. Again as far as the BNF is concerned the use of . and ? are not excluded as legitimate characters. In 5. and 6. you seem to be speaking of *semantic* meaning. Such definitions are not part of the BNF, The paper you speak of details these characters and how to interpret them. One cannot appreciate what CIF is with just a BNF, they will need to read other specifications not reproducible in a BNF, and only explained in the textual form (as in the paper). cheers Nick I will make the changes after some review of this correspondence by others. -------------------------------- Dr Nick Spadaccini Department of Computer Science voice: +(61 8) 9380 3452 University of Western Australia fax: +(61 8) 9380 1089 Nedlands, Perth, WA 6907 email: nick@cs.uwa.edu.au AUSTRALIA web: http://www.cs.uwa.edu.au/~nick
- Prev by Date: Re: Backus-Naur Form for CIF
- Next by Date: Re: Survey of available CIF software and request for wish list
- Prev by thread: Re: Backus-Naur Form for CIF
- Next by thread: Re: Backus-Naur Form for CIF
- Index(es):