[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
A formal specification for CIF version 1.1 (Draft)
- Subject: A formal specification for CIF version 1.1 (Draft)
- From: Brian McMahon <bm@xxxxxxxx>
- Date: Wed, 10 Jul 2002 15:40:33 +0100 (BST)
A draft specification for CIF is now available for community review at the URL http://www.iucr.org/iucr-top/cif/developers/spec/cifspec.html This page provides links to two documents, describing respectively the syntactic and semantic components of the "Crystallographic Information File". To focus subsequent discussion on this list, I would invite you first to review and comment on the syntax specification at http://www.iucr.org/iucr-top/cif/developers/spec/cifsyntax.html since this describes the syntax rules that all parsers must follow; some at least of the semantic content can be tailored to the needs of individual applications. However, the semantics document is also posted for information. One point should be made carefully: this specification is for an extended version of CIF, not yet formally adopted by COMCIFS. The only significant extensions to the existing standard are: restriction of the line-length constraint from 80 to 2048 characters, and the introduction of matching square brackets as additional delimiters for string values containing white space. It should be possible easily enough to reverse-engineer this specification to generate a complete specification for the existing standard (version 1.0). The reason that we have done things in this somewhat inverse way is that no two existing CIF parsers behave identically in the handling of the more subtle allowed syntactic features. Hence every existing CIF parser will need to be examined and in principle modified if it is to be fully compliant against version 1.0; it is therefore an opportune time to signal additional changes that would be necessary for version 1.1 compliance. Note further that the extensions are important for the reading of CIFs; applications that write CIFs will not need to be changed at all (provided that they currently write valid CIFs): a CIF that is valid against version 1.0 will necessarily be valid against version 1.1. Some general comments: CIF is intended as an archival and portable format. For this reason, the description of certain syntactic features has been constructed with care to try to avoid machine or operating-system dependencies. This is particularly the case with the discussion regarding end-of-line delimiters. Here an attempt has been made to reconcile the practical handling of files which are transported or shared across common operating systems such as Unix, MacOS and MSWindows with the more general formulation that is required to support files on mainframe or elderly record-oriented OS architectures. There are similar concerns with the specification of character sets. Despite the growing utility of Unicode, maximum portability across platforms is achieved by specifying a very precise (if restrictive) set of characters. In this document, they are expressed by reference to the ASCII character set, but the wording is such as to permit use of the same characters under an EBCDIC or other encoding scheme. There has also been extended internal discussion on the line-length limit. The view of COMCIFS is that it would be desirable in principle to drop any limit on the length of a text line, but that practical implementation limits in certain systems still argue in favour of a finite length. The limit of 2048 is arbitrary, but is intended to address the most common reason for violation of the existing 80-character limit, which is manual editing in a GUI window (especially using a proportionate font) that typically overruns by only a few characters. Applications will still be encouraged to write within or near to 80-character lines where possible. Such considerations will rarely trouble a developer on a single platform; but applications that expect to handle files under different machine and OS architectures will need to shoulder the responsibility of managing any necessary underlying record or byte manipulation to preserve the integrity of the files on the target systems. A formal grammar is presented for CIF using BNF-style notation. CIF however has a context-sensitive grammar which is not amenable to description purely in terms of a BNF. The specification therefore contains careful commentary and prescriptions for lexical analysis that must be read and implemented very carefully. This also accounts for the extended nature of certain of the language productions, where context-sensitive handling is required by declaring elements on both the left-hand and right-hand sides of productions that must match. Where differences of substance exist between this formulation and Nick's formerly-posted BNF, the differences may legitimately be described and discussed on this list. However, the intention is to move towards formal adoption of this specification as the reference standard. The numbering of paragraphs is for ease of labelling and has no deeper purpose. If this draft needs to be changed during this cycle of review I shall add or delete paragraphs without disturbing the existing numbering. Paragraphs in smaller font are intended to provide additional commentary to the main text, but again no deep significance should be attached to the use of small or larger type. Regards Brian _______________________________________________________________________________ Brian McMahon tel: +44 1244 342878 Research and Development Officer fax: +44 1244 314888 International Union of Crystallography e-mail: bm@iucr.org 5 Abbey Square, Chester CH1 2HU, England bm@iucr.ac.uk
Reply to: [list | sender only]
- Prev by Date: Re: CIF parser / dialects
- Next by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Prev by thread: Re: the dictionary merging protocol
- Next by thread: RE: A formal specification for CIF version 1.1 (Draft)
- Index(es):