[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Request for approval of CIF version 1.1 specification
- To: Multiple recipients of list <comcifs-l@iucr.org>
- Subject: Re: Request for approval of CIF version 1.1 specification
- From: John Faber <faber@icdd.com>
- Date: Wed, 16 Oct 2002 17:04:32 +0100 (BST)
I agree with this suggestion John At 02:27 AM 10/16/2002 +0100, you wrote: >Attached are some comments regarding the 1.1 CIF specification >as posted by Brian. > >I am wondering if it wouldn't make sense to officially announce these >documents as "Working Specification" with the expressed intend to adopt >them as "final" after being publicly available and actively used by >implementors for a while (e.g. a year). Hopefully the community will >see this as an invitation to participate in ironing out any kinks. > >Ralf > > >---------------------------------------------------------------------------- > >Version 1.1 Specification > >I have to admit that am still not entirely clear what the authoritative >Version 1.0 Specification is (Hall, Allen & Brown, 1991?). It would be >useful to clearly explain this in the introduction ("Revision >history"). It would also be useful to outline the boundaries between >this specification and the DDL specifications ("Scope"). > >---------------------------------------------------------------------------- > >Syntax: > >Definition of terms: consolidate in one place (link). > >Regarding quoting rules: > >I am asking myself how to deal with a string like > >; >contains both isolated ' and " and ends with a ' >; > >If I understand correctly, anything can be handled in a multi-line text >field. However, take the viewpoint of someone implementing a CIF >writer. If the goal is to make the output human-readable, one would >probably prefer quoted strings over multi-line text, in particular >inside a loop_ construct. But then it seems necessary to pre-scan the >text fields to be output to determine what kind of quoting is >applicable. I am under the impression that it could be quite hard to >devise an algorithm that generates both correct and "nice" output. A >human working on a CIF will face similar difficulties. Isn't this an >issue in practice? Could it be useful to include some practical >quoting guidelines? > >I believe many people will expect constructs like > 'an embedded \' quote' >or > "an embedded \" double quote" >to work as they do in many programming languages. To avoid this common >misunderstanding it will be useful to provide a link to the "Accented >letters" table in the semantics document. > >17. ... trailing white space on a line may however be elided. > ^^^ > >In my opinion the specification should be unambiguous: >White space should not be elided by the parser. The data value should >be left untouched. Eliding is in the regime of semantics, not syntax. > >20. ... >By contrast the value of the text field > >; foo > bar >; > >is `foo\n bar' ... > >Should this be `foo\n bar' (two spaces before the bar)? > >Also, in the semantics document the notation <eol> is used instead >of \n. I suggest using <eol> everywhere. > >22.: > >The ASCII characters at decimal positions 11 (VT or vertical tab) and >12 (FF or form feed), often included in library implementations as >white space characters, are explicitly excluded from the CIF character >set at this revision. > >Points: > > 1. I don't see the benefit of explicitly excluding these > characters. In practice it means that parsing of old > files might fail only because these characters are > embedded. I know there was some discussion already, > but I cannot remember the details. Is there something > wrong with the following, more forgiving approach: > Unquoted VT and FF are treated as white space, > quoted VT and FF are "passed through" like any other > character: > > WhiteSpace> := { <SP> | <HT> | <VT> | <FF> | <eol> > | <TokenizedComments>}+ > <AnyPrintChar> := <OrdinaryChar> | <double_quote> | '#' | '$' > | <single_quote> | '_' | <SP> | <HT> | <VT> | <FF> > | ';' | '[' | ']' > > 2. If it is decided to explicitly exclude VT and FF this deviation > from STAR should (also) be listed under "Implementation restrictions." > > >27. > >How does the "Maximum line length" apply to <eol>\; quoted strings >as explained in the semantics document? For example, is the following >legal? > >;\ >2000 characters ...\ >2000 characters ... >; > >Finally, in the post-Fortran and post-C era line length restrictions >seem very arbitrary and are ultimately a nuisance. I'd rather see this >restriction removed from the specification. Programs written in >languages without automatic dynamic memory management could simply >allocate a large buffer (e.g. 128k are perfectly reasonable these days) >and report an "Technical limitation" in the highly unlikely event >that the buffer is insufficient. > >---------------------------------------------------------------------------- > >Semantic: > >This sentence in the introduction leaves me puzzled: > > As computer techniques evolve, it becomes more appropriate to discuss > the machine-accessible semantic content, or "meaning", of the data in > such a file. > >Again: Definition of terms: consolidate in one place (link). > >10. The character string [local] is reserved for local use. > ^ ^ >Is this [notation] used somewhere else? Are there alternatives? > >Handling of long lines > > - I am a bit surprised that this is presented in the semantic > features document rather than the syntax document. > > - Why do we need this for # comments? > >Typographic style codes > > I don't see how these comments could make a significant difference in > practice, but they significantly contribute to conveying the > impression that the semantics features are a bit of a hodgepodge. > I suggest deleting the entire "Typographic style codes" section. > > >__________________________________________________ >Do you Yahoo!? >Faith Hill - Exclusive Performances, Videos & More >http://faith.yahoo.com John Faber, Ph.D. Principal Scientist International Centre for Diffraction Data 12 Campus Boulevard Newtown Square, PA 19073-3273, USA +1-610-325-9814 (phone) +1-610-325-9823 (fax) faber@icdd.com (e-mail)
- Prev by Date: Re: Request for approval of CIF version 1.1 specification
- Next by Date: Voting membership
- Prev by thread: Re: Request for approval of CIF version 1.1 specification
- Next by thread: Re: Request for approval of CIF version 1.1 specification
- Index(es):