[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: ddb@owari.msl.titech.ac.jp
- Date: Sat, 4 Sep 2004 18:43:00 +0900 (JST)
Hi > Here are a few more comments from IDB: > >So how do you intend to get around this namespace issue? No CIFs that I > >have encountered have ever declared their conformance to any dictionary. > >Even if they did, there is something called the dictionary stacking > >protocol > >which allows those definitions to be overridden without declaring a > >namespace. > >On top of that there is the boundless capacity for making up your own > >data names on the fly for which there may never be any dictionary > >definition > >at all. How can you reliably assign anything but a generic namespace to an > >infoset? Its all just adhoc guesswork. > > The core dictionary defines three items which can be looped: > _audit_conform_dict_name > _audit_conform_dict_version > _audit_conform_dict_location # Contains the URL where the > dictionary can be found > As far as I know these have not been widely used - Acta Cryst. should > start insisting that these be included in submitted papers. There is no > need to give the dictionary version in anything as ephemeral a comment. That sounds like a positive step, but would that go in every data_block or is it a global_ thing? You may need to add something like _audit_conform_dict_stacking_order to ensure looped dictionaries of symmetry overriding core don't get confused with core overriding symmetry, for example, (assuming loop order is not significan?) if that is possible? The problem I see is that the effort invested in implementing it for all newly created and submitted CIFs is wasted because it is an incomplete solution and no current software uses it or needs it. You still have to deal with existing archives of CIF which don't state their conformance, and even for CIFs that do, users are free to conjure up any ad hoc data names they like and use them in any context. So, to try and resolve the namespace of each name, you would need to (1) check the _audit_conform list of dictionaries in reverse order (2) check against the list of registered prefixes for accidental matches (3) check all versions of all publically accessible dictionaries (4) then give up. Not an efficient process if there was a match and no guarantee that it was a correct match if names were reused in different contexts in different dictionaries. Two simple things would fix that. Associating a distinguishable prefix on each name with the _audit_conform stuff and banning ad hoc data names. Anything else and you will always be just guessing. I don't really know what you are hoping to achieve. > > ># start Validation Reply Form > >_vrf_DIFF020_114 > >;PROBLEM: _diffrn_standards_interval_count and > >RESPONSE: ... We have used an image-plate system > >; > > > >If intelligent software was ever intended to deal with such _vrf_s, why > >embed the only pointer to their purpose in supposedly non parsable data > >names rather than in looped, discrete sets of tags such as > > > >loop_ > > _vrf_suite _vrf_subroutine _vrf_error_code _vrf_authors_response > > This would tidy things up, but the parser must be able to handle ad hoc > data names without choking. If its important enough to create a name for it then isn't it important enough define its purpose somewhere? Ad hoc data names seem to provide nothing useful besides a legitimate excuse for laziness in the specification. Theres no incentive to organize things tidily. Maybe they were important originally when COMCIFS were exploring the field, before dictionaries were introduced, but is it still important to be able to make up arbitrary stuff and stick it in a CIF without definition? Who is doing this and how are they using it? Do they really intend to save it for posterity? > >>>>Q Is the order of "rows" in a loop_ unimportant? > >>> > >>>Yes (in CIF). > >> > >>That is very useful (and non-obvious from the spec. It then makes it > >>possible to confirm the identity of two sets of coordinates, symmetry > >>operations, etc. > >> > >>It is also debatable. > >>The very recent introduction of _symmetry_equiv_pos_site_id means that > >>the data integrity of the majority of prior archived CIFs containing tag > >>values like: _geom_bond_site_symmetry_1 "4_564" > >>would be seriously impaired by a change of order in the > >>loop_ _symmetry_equiv_pos_as_xyz > > This was a serious omission in the first version of CIF (you have to > remember that this was produced before we even considered writing > dictionaries in STAR format). As you point out we have introduced the > list reference _symmetry_equiv_posi_site_id (which incidentally has now > been superceded by _space_group_symop_id taken from the symmetry_cif > dictionary - a dictionary which takes a more systematic and > forward-looking approach to symmetry). Again Acta Cryst. should insist > on the inclusion of these id's. Would a statement of conformance to an older dictionary version be sufficient grounds to escape these CIF changes (just checking :-)? But I guess my original concern here was that order independence of loop_ structures based on earlier, and possibly alternative dictionaries, as well as ad hoc looped data (maybe thats not important, but you never know...), is not assured in general, particularly for raw data in whatever form it takes (nmr? image CIF?). > >I had a hazy recollection that "this is a string" and this_is_a_string > >were equally valid CIF constructs containing identical information > >content, > >used for example in space group names. Would they be formally identical in > >an infoset? Does the white space in all strings have to be normalised (is > >that the right word?)? > > We had a discussion of this point while preparing the symmetry_CIF > dictionary and came to the decision that these two strings were not > equivalent, i.e., underscore is not white space.. Bummer. I know one program that needs changes made :-( But perhaps I could also draw your attention to this: http://journals.iucr.org/services/cif/stdcodes.html#Appdx4.3 as evidence that underscores do seem to be an officially sanctioned form of white space in uchar data types. And maybe I can raise another issue, in the context of PMR's interest in data_global, would the following construct be legitimate: data_global _publ_contact_author_name "Fred" data_a _import_data_from_block global # defined in an associated dictionary as: data_import_data_from_block _name '_import_data_from_block' _category obscure_semantics _type uchar _definition ; Import all data from the named data_block into the current data_block Watch out for duplicate _data_element_names though! Also watch out for circular imports! ; As far as I am aware there is nothing that restricts such semantics. Everything seems to be above board in terms of the CIF content. its just that a request for _publ_contact_author_name from within data block data_a seems destined to fail at the software access stage. Does that mean CIF conformant software can never be totally CIF conformant? Thanks for the response. Doug
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF Infoset (David Brown)
- Re: CIF Infoset (Herbert J. Bernstein)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):