[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Opinions on comments as part of the content
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>, "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Opinions on comments as part of the content
- From: peter murray-rust <pm286@cam.ac.uk>
- Date: Wed, 07 Mar 2007 18:35:04 +0000
- In-Reply-To: <20070307103201.GA2963@emerald.iucr.org>
- References: <45EDB89B.20907@niehs.nih.gov><7.0.1.0.0.20070307072016.02573858@cam.ac.uk><20070307103201.GA2963@emerald.iucr.org>
Thanks Brian, We are clearly in almost complete agreement. Minor comments below. At 10:32 07/03/2007, Brian McMahon wrote: >courtesy; and even, in Peter's case, efforts are made to retain >them by applying sensible heuristics if content is re-ordered. >I think such applications have value, but there is no requirement >on them by the standard to do so, nor do I believe there should be. Agreed. >PMR> The first observation is that CIF does not define an abstract data >PMR> model (e.g. the Infoset in XML) so it is difficult to on what a >PMR> parser should do other than confirm validity to the CIF standard. >PMR> ... We have written a CIF parser (CIFDOM) which parses CIFs into an >PMR> abstract data model which can be expose in XML syntax and conforms to >PMR> Document Object models (DOM). > >This is a good point. In practice CIFs map to different document object >modules: small-molecule CIFs submitted to Acta C/E represent a >scientific article reporting one or more discrete structures. msCIF >represents an aggregate of structural descriptions of one or more >compounds/phases, several of which may be overlaid to describe >modulated structures as superpositions of substructures. PDB mmCIFs >represent single-compound database records. symCIFs represent >tabulations of symmetry properties for different space groups. These >models aren't mutually exclusive; they will have significant overlaps. >But I think we need to work at formalising the abstract structures, >classifying different models and mapping to appropriate DOMs if it >turns out that it's necessary to do so. Fully agreed. I believe that it is possible to have a single DOM for the non-STAR CIFs, based on DDL1. (Is it still called that). The more complex arrangements should be extensions of this. There will be a need for a language to define the different conventions. (My own CML has an attribute 'convention' precisely for this purpose). Some signal is required because it is non-trivial to work out what document model is mandated by a given lexical CIF >PMR> In doing this we have had to make >PMR> various interpretations of the standard, while trying to >PMR> retain the goodwill of authors and readers ... We apply >PMR> the following from the standard ... >PMR> I would be grateful to know if any COMCIFer has a different >view of these. > >I agree with Peter's interpretations. [I would like to see some >applications developed that did apply various styles of "pretty >printing" (and might one day find time to work on them myself), >for there is a certain aesthetic and usefulness in working with >pure-ASCII files in old-fashioned text-editing environments. >But I accept that these are cosmetic requirements only, and >any parser is at liberty to normalise whitespace that is used to >separate data items.] > >PMR> Does this mean one or more comments before the first block? I don't >PMR> think the standard defines a CIF header comment. > >The 1.1 specification recommends that a CIF begin with a comment string >#\#CIF_1.1 to act as a version indicator, and incidentally as a magic >number to help filetype applications supported by an operating system >to identify the file type. Only a recommendation (since it was absent >from the initial spec), and I'd be interested in whether applications do >make use of this string when it is found. I think this would be valuable. There are an increasing number of applications which need to 'guess' filetype and magic signals are valuable. >Other than that, comments may occur before the first block, but >without any specifc semantics. > >PMR> This is one of a small number of topics which could benefit from >PMR> clarification (and in some cases an arbitrary ruling): >PMR> >PMR> * data blocks. Is the value of the data block case-sensitive? are >PMR> data block ids which differ only in case identical and therefore >PMR> illegal. Is it allowed to have an empty string as id? or any mixture >PMR> of non-whitespace CIF chars (e.g. punctuation only) > > "The file may be partitioned into multiple data blocks by the > insertion of further data-block headers. Data-block headers > are case-insensitive (that is, two headers differing only in > whether corresponding letter characters are upper or lower > case are considered identical). Within a single data file > identical data-block headers are not permitted." > (International Tables G, p.21) > >An empty block code is not permitted in a datablock header (i.e. >data_ on its own is invalid); any mixture of non-whitespace >characters is allowed (probably unfortunate, but that's the way >it is). Thanks - I should have picked this up - I tend to refer to the CIF spec. >PMR> * data_global. This is so widespread that it would be useful to have >PMR> at least an agreed heuristic for it. >PMR> * multi-data-block CIFs. Is it legitimate to split them? If so, >PMR> can/should data_global be copied into each? > >For Acta papers, the *heuristic* is that data_global contains information >that applies to all the following data blocks - in practice, the title, >authors, discursive text of the paper, while succeeding data blocks contain >the experimental and derived data for each structure. That's a reasonably >reliable heuristic for that particular document model, but need not apply, >say, to a modulated-structure DOM. For a long time I've thought that we need >to formalise the relationships between data blocks (see e.g. >http://www.iucr.org/iucr-top/lists/comcifs-l/msg00228.html). Yes - this is very useful - it had slipped my memory that we had preserved this discussion. See also comments to Joe's mail >PMR> * what are the semantics of '?' and '.' > > "The more important use of the null data type is its application > to the meta characters ` ?' (query) and ` .' (full point) that > may occur as values associated with any data name and therefore > have no specific type. ... > > The substitution of the query character ` ?' in place of a data > value is an explicit signal that an expected value is missing > from a CIF. This `missing-value signal' may be used instead of > omitting an item (i.e. its tag and value) entirely from the file, > and serves as a reminder that the item would normally be present. > > The substitution of the full-point character ` .' in place of > a CIF data value serves two similar, but not identical, purposes. > If it is used in looped lists of data it is normally a signal that > a value in a particular packet (i.e. a value in the row of the > table) is `inapplicable' or `inappropriate'. In some CIF > applications involving access to a data dictionary it is used to > signal that the default value of the item is defined in its > definition in the dictionary. Consequently, the interpretation > of this signal is an application-specific matter and its use must > be determined according to the application." > (International Tables G, p.24) > > >PMR> Is it legitimate to delete an item of the form: _foo ? >PMR> or does it convey information? > >It conveys information (though probably only the person who put it there >knows what). Of course, an application that is expressly validating >against a dictionary might choose to omit it. Thanks - have also noted some examples in earlier reply. P. >Brian >_______________________________________________ >comcifs mailing list >comcifs@iucr.org >http://scripts.iucr.org/mailman/listinfo/comcifs Peter Murray-Rust Unilever Centre for Molecular Sciences Informatics University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK +44-1223-763069
Reply to: [list | sender only]
- Follow-Ups:
- Re: Opinions on comments as part of the content (Joe Krahn)
- References:
- Opinions on comments as part of the content (Joe Krahn)
- Re: Opinions on comments as part of the content (peter murray-rust)
- Re: Opinions on comments as part of the content (Brian McMahon)
- Prev by Date: Re: Opinions on comments as part of the content
- Next by Date: Re: Opinions on comments as part of the content
- Prev by thread: Re: Opinions on comments as part of the content
- Next by thread: Re: Opinions on comments as part of the content
- Index(es):