[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Opinions on comments as part of the content
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Opinions on comments as part of the content
- From: Brian McMahon <bm@iucr.org>
- Date: Wed, 7 Mar 2007 10:32:01 +0000
- In-Reply-To: <7.0.1.0.0.20070307072016.02573858@cam.ac.uk>
- References: <45EDB89B.20907@niehs.nih.gov><7.0.1.0.0.20070307072016.02573858@cam.ac.uk>
JK> Are there people using comments to hold pertinent information? JK> If so, has there been any attempt to add a general purpose JK> comment data items? My thinking is that the only comment that JK> should have valid information is the CIF header comment, Comments must not be relied upon to carry "portable" information. There are a number of applications where they are useful: for example, Acta Cryst template CIFs make liberal use of comments to indicate to a human reader the best way to complete data items, but they don't embed any data that should be exposed in a purely crystallographic application. Applications that don't re-order content are at liberty to carry or discard comments, and it's true that a number do carry them along as a convenience, or courtesy; and even, in Peter's case, efforts are made to retain them by applying sensible heuristics if content is re-ordered. I think such applications have value, but there is no requirement on them by the standard to do so, nor do I believe there should be. PMR> The first observation is that CIF does not define an abstract data PMR> model (e.g. the Infoset in XML) so it is difficult to on what a PMR> parser should do other than confirm validity to the CIF standard. PMR> ... We have written a CIF parser (CIFDOM) which parses CIFs into an PMR> abstract data model which can be expose in XML syntax and conforms to PMR> Document Object models (DOM). This is a good point. In practice CIFs map to different document object modules: small-molecule CIFs submitted to Acta C/E represent a scientific article reporting one or more discrete structures. msCIF represents an aggregate of structural descriptions of one or more compounds/phases, several of which may be overlaid to describe modulated structures as superpositions of substructures. PDB mmCIFs represent single-compound database records. symCIFs represent tabulations of symmetry properties for different space groups. These models aren't mutually exclusive; they will have significant overlaps. But I think we need to work at formalising the abstract structures, classifying different models and mapping to appropriate DOMs if it turns out that it's necessary to do so. PMR> In doing this we have had to make PMR> various interpretations of the standard, while trying to PMR> retain the goodwill of authors and readers ... We apply PMR> the following from the standard ... PMR> I would be grateful to know if any COMCIFer has a different view of these. I agree with Peter's interpretations. [I would like to see some applications developed that did apply various styles of "pretty printing" (and might one day find time to work on them myself), for there is a certain aesthetic and usefulness in working with pure-ASCII files in old-fashioned text-editing environments. But I accept that these are cosmetic requirements only, and any parser is at liberty to normalise whitespace that is used to separate data items.] PMR> Does this mean one or more comments before the first block? I don't PMR> think the standard defines a CIF header comment. The 1.1 specification recommends that a CIF begin with a comment string #\#CIF_1.1 to act as a version indicator, and incidentally as a magic number to help filetype applications supported by an operating system to identify the file type. Only a recommendation (since it was absent from the initial spec), and I'd be interested in whether applications do make use of this string when it is found. Other than that, comments may occur before the first block, but without any specifc semantics. PMR> This is one of a small number of topics which could benefit from PMR> clarification (and in some cases an arbitrary ruling): PMR> PMR> * data blocks. Is the value of the data block case-sensitive? are PMR> data block ids which differ only in case identical and therefore PMR> illegal. Is it allowed to have an empty string as id? or any mixture PMR> of non-whitespace CIF chars (e.g. punctuation only) "The file may be partitioned into multiple data blocks by the insertion of further data-block headers. Data-block headers are case-insensitive (that is, two headers differing only in whether corresponding letter characters are upper or lower case are considered identical). Within a single data file identical data-block headers are not permitted." (International Tables G, p.21) An empty block code is not permitted in a datablock header (i.e. data_ on its own is invalid); any mixture of non-whitespace characters is allowed (probably unfortunate, but that's the way it is). PMR> * data_global. This is so widespread that it would be useful to have PMR> at least an agreed heuristic for it. PMR> * multi-data-block CIFs. Is it legitimate to split them? If so, PMR> can/should data_global be copied into each? For Acta papers, the *heuristic* is that data_global contains information that applies to all the following data blocks - in practice, the title, authors, discursive text of the paper, while succeeding data blocks contain the experimental and derived data for each structure. That's a reasonably reliable heuristic for that particular document model, but need not apply, say, to a modulated-structure DOM. For a long time I've thought that we need to formalise the relationships between data blocks (see e.g. http://www.iucr.org/iucr-top/lists/comcifs-l/msg00228.html). PMR> * what are the semantics of '?' and '.' "The more important use of the null data type is its application to the meta characters ` ?' (query) and ` .' (full point) that may occur as values associated with any data name and therefore have no specific type. ... The substitution of the query character ` ?' in place of a data value is an explicit signal that an expected value is missing from a CIF. This `missing-value signal' may be used instead of omitting an item (i.e. its tag and value) entirely from the file, and serves as a reminder that the item would normally be present. The substitution of the full-point character ` .' in place of a CIF data value serves two similar, but not identical, purposes. If it is used in looped lists of data it is normally a signal that a value in a particular packet (i.e. a value in the row of the table) is `inapplicable' or `inappropriate'. In some CIF applications involving access to a data dictionary it is used to signal that the default value of the item is defined in its definition in the dictionary. Consequently, the interpretation of this signal is an application-specific matter and its use must be determined according to the application." (International Tables G, p.24) PMR> Is it legitimate to delete an item of the form: _foo ? PMR> or does it convey information? It conveys information (though probably only the person who put it there knows what). Of course, an application that is expressly validating against a dictionary might choose to omit it. Brian
Reply to: [list | sender only]
- Follow-Ups:
- Re: Opinions on comments as part of the content (peter murray-rust)
- References:
- Opinions on comments as part of the content (Joe Krahn)
- Re: Opinions on comments as part of the content (peter murray-rust)
- Prev by Date: Re: Opinions on comments as part of the content
- Next by Date: Re: New accent modifier types?
- Prev by thread: Re: Opinions on comments as part of the content
- Next by thread: Re: Opinions on comments as part of the content
- Index(es):