[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
the dictionary merging protocol
- Subject: the dictionary merging protocol
- From: Doug du Boulay <ddb@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 15 Jul 2002 11:25:34 +0100 (BST)
I hope it is okay to make a few comments here about the dictionary overlay protocol as documented here: http://www.iucr.org/iucr-top/lists/cif-developers/msg00044.html I hope we can daw a distinction between "valid" and "conformant" with respect to the encouraged CIF data_block tags: _audit_conform_dict_name _audit_conform_dict_version _audit_conform_dict_location The following extract from the above essentially defines valid: <required. These issues are addressed in reverse order below. Note that an <application seeking to validate a data file should not consider the file <invalid if a data name is found that has no definition in the dictionaries <referenced. The CIF standard permits the incorporation of local and <standard names in any data file. Nevertheless, it is recommended as good <practice that all data names in a CIF should be able to be validated against <dictionary files, including locally constructed dictionaries. My understanding/definition of conformant is 100% or nothing. The slightest discrepancy at all means it is no longer conformant. With this definition the _audit tags above seem mislabeled, but I will continue here assuming the intended meaning is "valid". What is the advantage of a CIF data_block specifying that it is valid against dictionaries x.dic, y.dic & z.dic when it could also be full of unknown/unrecognised data items to which it is not conformant? It would also be "valid" against a null dictionary. On the other hand, I am guessing, that we could use a new undefined data item as a new member of a non listable category and then loop_ over it alone thereby not only being non conformant but also destroying the previously accepted validity. Is that possible? >From the point of view of CIF validation, the proposed dictionary merging protocol looks functional enough. But the protocol itself seems to be a set of externally based informal rules designed to be hard coded into validation software. The commands for specifying how to create/ assemble a dictionary to which a given CIF data block may or may not be conformant (even though it may be valid) are actually embeded in the CIF, or passed to the validation software as arguments. There is no support therein for fine grained control over how individual data items and or category classes may be totally replaced, by or appended to from the separate disparate dictionaries. It is an all or nothing approach. The currently envisaged dictionary construction mechanism does not yet permit specification of such PREPEND, APPEND REPLACE modification attributes in the CIF data_block itself, so there is no way to retain this information across dictionary reconstruction invocations. The dictionary constructed in this manner does not even exist as a referenceable object (it is purely virtual). This could make it rather more difficult to group togeather data sets conforming to the same (i.e. which?) dictionary. The recent discussion of the CIF specification indicates that in CIF1.1 dictionary style save_ frames will be permitted in purely data CIFs opening up the possibility of combined dictionaries and data. I am not sure if this is the direction things are intended to go but it seems to me to be tooooo flexible for something that is supposed to be a purely data archival format. It also seems counterproductive to the overall scheme of standardization because basically any CIF can create any dictionary it likes and say hey I am valid against this, (even if it doesn't conform). What I would prefer to see in a CIF data_block is a single reference to a real dictionary to which it is totally 100% conformant. I think this might force the creaters and distributors of CIFs to be more responsible in iether rigourously adopting existing dictionaries, or making their internal dictionaries available to the community if they want their CIFs widely accepted. Therein lies my previous concern about: > > And can a CIF (file or data_block?) really be totally conformant with > > more than one dictionary, i.e. why the need for item 27 loop_? Would it > > not be The benefit is that it would enforce an explicit hierarchical dictionary dependence, through inter-dictionary reference pointers, rather than an inferred one based on an ad hoc protocol burried in an implementation layer. In the final example of the last appendix of the dictionary overlay protocol referenced above there is an already recognized scope for error. with the data_block contents: loop_ _audit_conform_dict_name _audit_conform_dict_version _audit_conform_dict_location a.dic 2.1 . b.dic 1.0 . c.dic 1.0 /usr/local/dics/my_local_dictionary if you then run the hypothetical command line dictcheck -mode OVERLAY test.cif you can end up with the situation where b.dic can overlay a.dic but c.dic cannot overlay b.dic (very last part of the example ) On the other hand c.dic could overlay a.dic and b.dic can then overlay c.dic. without any problems. So there is, as was noted in the appendix, a potential ordering problem. Problems of this type should be alleviated by an explicit hierarchical dependence. Despite all that my real reason for preferring CIFs to be 100% conformant to one single specified dictionary was based on inherent laziness coupled to considerations of dictionary based data structures, as distinct from generic CIF based data structures. With a typical cif parser you can readily create a generic cif data structure but if you wish to create a dictionary based data structure, the generic cif data structure is now forced upon you as an essential precursor, because you have to pick the cif apart to find the dictionary conformance/creation tags in order to identify the relevent dictionary data model needed. The intermediate step could be ignored if the conformant dictionary was specified right up front, XML style. The next issue was that it would be a damn sight more convenient to be able to use a precompiled representation of a dictionary for generating a dictionary based data structure than it would be to have to go off and build a new one for every new data_block you encounter. This former would rely more heavily on dictionary caching. The third issue concerns valid, as distinct from conformant data files. It would be a lot simpler to build a dictionary based data structure knowing that the data to be stuck in it are 100% conformant, rather than having to cater for spurious nonconformant garbage at every level. Given that the self proclaimed purpose of the dictionary merging protocol was to facilitate the development of dictionary-driven applications and therby, I hope dictionary driven data structures, it would be a shame if it all got started off on an inconvenient footing. But I guess these are just implementation issues. One final comment about the COMCIFS envisaged future CIF global_ data structure. If every data_block in a CIF conforms to a completely different dictionary you could in general wind up with incompatible global data and no formal way to specify the inheritance that other data_blocks may need. Perhaps building in a hierarchical object oriented style subclassing model from the beginning might help to clarify such situations in the future. I guess really I am just questioning what the considerations were when this _audit_conform business was initated, and if it could be clarified before being cast in stone. Thanks Doug
Reply to: [list | sender only]
- Prev by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Next by Date: Re: the dictionary merging protocol
- Prev by thread: Re: square brackets in the draft
- Next by thread: Re: the dictionary merging protocol
- Index(es):