[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: Brian McMahon <bm@iucr.org>
- Date: Tue, 7 Sep 2004 14:58:35 +0100
- In-Reply-To: <f06100500bd5f63d992eb@[192.168.2.100]>
- References: <Pine.LNX.4.44.0409041834570.9129-100000@owari.msl.titech.ac.jp><f06100500bd5f63d992eb@[192.168.2.100]>
I am sorry to come into this discussion so late in the day: I'm tied up in a number of local projects that are occupying most of my attention. However, it raises a lot of interesting points. First, it seems to me that machine validation of "semantic" content is still at a very elementary stage. (Well, zerothly, I'm not even sure that the meaning of "semantic" can be determined unequivocally in this context.) I've spent some time thinking about Peter's request that the use of "data_global" to carry common information between blocks be either formalised or deprecated. All I can come up with at present is to say that it's not part of the published standard. Therefore, a software designer trying to write a general application that will handle any CIF in a standard-compliant way need treat "data_global" no differently from any other data block. On the other hand, it's a useful convention that is shared between Acta, CCDC and some others, and I would not wish to see it expressly forbidden, unless and until a better and more reliable way to achieve its purpose can be found. My suggestion for the "better way" is to look into expanding the AUDIT_LINK mechanism: this allows for statements of relationships between data blocks. In the present core dictionary the category is present, but the potentially useful field _audit_link_block_description is a free-text one; the dictionary example is loop_ _audit_link_block_code _audit_link_block_description . 'discursive text of paper with two structures' morA_(1) 'structure 1 of 2' morA_(2) 'structure 2 of 2' To be of real use in establishing relationships, there needs to be an enumerated field that expresses all the relationships that one can handle mechanically. I'd be interested in people's thoughts on how feasible such a project would be. Peter asks whether the semantics implied by the conventional use of the data_global block are the same as would be achieved by use of the global_ mechanism. It is an interesting question. In practice the data items used in the data_global block are a different set from those in the structural data blocks: they describe the article title, authors and comment, while the separate structure data blocks contain the atom coordinates etc. There is no provision for trying to interpret a case where say _publ_section_title (the title of a paper) appears in data_global while a different value of _publ_section_title appears in data_foo. This differs from global_, where there are clear rules of inheritance and precedence. On the other hand, global_ also enforces an order on the subsequent data blocks, while data_global can be placed anywhere one chooses. It is therefore much easier for relatively undisciplined software to make use of the data_global mechanism in the current ad hoc way. (It is worth bearing in mind that much of the implementation of CIF arises from modifications to existing programs that were never originally designed with an eye to the CIF data model. In that sense the imposition of a very strict 'discipline' in writing and even reading CIFs places a heavy burden on the maintainers of those programs. This is not to criticise them for what - in scientific terms - are immensely valuable programs. But it's quite likely that many older programs, especially if written in Fortran, will struggle ever to become fully CIF compliant. It's also worth observing that they would likely struggle similarly to handle semantic-rich i/o in XML or any other such representation.) Couple of other topics: comments really should be considered as void of semantic content in terms of the information designed to be exchanged and archived in CIFs. If an application wishes to preserve comments (out of politeness or conservatism) that's fine, but the application writer must exercise his or her own judgement as to how to handle this, especially if the order of items is changed. The fact that statements of copyright can be found within comments indicates only that the community has not yet considered it important to carry statements of intellectual property ownership along as tagged content. I am sure that many complex questions about copyright arise when one considers the proper treatment of IPR to extracts or reorderings of data files. I consider the 'magic number' comment recommended to start a CIF as special, but only in the sense that it's provided as a courtesy to graphical file managers etc - it identifies a file as of type CIF (in the same way that Windows tries to identify the type of a file by its name suffix). CIF application software should look instead for _audit_conform* tags if the intention is to test conformance against a particular dictionary or set of dictionaries. The namespace and conformance questions arise from the original design goal to permit software authors to include their own tags alongside "standard" tags in CIFs. There have been some developments to guard against naming clashes, e.g. the registry of reserved prefixes, but the implementation is awkward, and a case might well be made for simplifying things with a syntactic device (e.g. a colon) to make the namespace identifier easier to parse. As Herbert points out, the problem becomes real (and progressively more acute) as CIF becomes more widely used in different subjects. Many perceived problems arise because there are no reference implementations (for example of the dictionary stacking protocol). Perhaps this is because the community has not in fact seen a real need for such features, but I think we can only gauge the real utility (and limitations) of the different suggested approaches by putting them to the test. As Herbert has mentioned, a project to create a reference implementation for dictionary stacking is now getting under way. The impression I retain is that the CIF approach still offers many novel features, and I'm not sure how many of those features are being tested in other environments - Peter may have some helpful examples for us from the XML world. It is in any case beneficial to test these features to see which of them are really useful in practice, and which in the long run are simply impractical. Regards Brian
Reply to: [list | sender only]
- References:
- Re: CIF Infoset (ddb)
- Re: CIF Infoset (Herbert J. Bernstein)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):