Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

I am also finding this interchange interesting.  I have only a couple of 
short comments to add:


>> Does an infoset for HTML that says
>>
>> <b><!--interpret hello as goodbye-->hello</b> is equivalent to
>> <b>hello</b>? If so, wouldn't that be somewhat dangerous?
>
>
> HTML chooses to make comments part of the infoset so those two 
> documents have different infosets. However the following *are* 
> equivalent at infoset level:
>
> <b>hello</b>
> <b    >hello</b>
> <b>&#104;<![CDATA[ell]]>o</b> 

Technically the comments are not part of the CIF, and in practice the 
CIFs I handle for Acta Cryst. only contain template comments that are 
designed to direct the author to include the requred information.   When 
CIF editors become more widely used, these comments will not be needed.

>> So if something is
>> numb, you expect it to be a number, irrespective of the lexical eye 
>> candy
>> provided by a variety of delimited string forms. If _cell_length is
>> declared numb, then '12.1' and 12.1 are equivalent in interpretation (at
>> the application level).
>
>
> The CIF specification indicates that these have different semantics. 
> If this is now obsolete or deprecated it would make implementations 
> simpler. 

The quotes are important.  The dictionary gives, I believe, the default 
type, but this can be overridden by the acutal type.  Thus in the 
example given above '12.1' would be read as char and an application 
would have to decide whether it could convert this to numb.  Quoting is 
important - for example in the dictionaries '_cell_length_a' is not a 
dataname, though _cell_name_a is.  This might occur in a CIF if someone 
wrote:
_exptl_special_details   '_exptl_density_obs unobserverable'

>> > > Q Does data_global have any semantics? I suspect that formally it 
>> does
>> > not, but it seems in widespread use:
>>
>>
>> data_global doesn't exist. 
>
>
> It does (frequently). (I appreciate that gloabl_ is different and 
> irrelevant to CIF/DDL1). data_gloabl is very frequently used as the 
> first block in a multiblock CIF to indicate information that (I 
> assume) the author wishes to apply to all blocks. I think it either 
> needs deprecating or accepting and formalising. 

One of the commonly used templates (I believe that supplied by SHELX) 
starts with a datablock called data_global but this is not a reserved 
dataname and has no significance beyond being a legitimate form of 
data_xxxxx.  In the template it introduces a datablock that contains the 
text part of a paper, with the numerical information supplied in one or 
more additional blocks depending on how many structures are being 
described. Since formally each datablock in CIF is independent, there is 
no formal linkage between the data_global datablock and any of the other 
datablocks that follow.  As Nick points out, global_ is defined in STAR, 
though not in the current version of CIF.  The name is currently 
reserved in CIF in case we wish to use it later.

> "." is worse because the spec can be interpreted as requiring the 
> implementer to insert the default value from the dictionary. At one 
> stage this would be interpreted to mean that unless specified all 
> extinstion corrections were, by default, Zachariasen. Defaults, and 
> their insertion, have to be explicitly specified. 

I agree there is a problem.  In working through dictionary definitions 
we are trying to remove the default values and in my view "." should 
never be used to indicate a default - it should only mean 'this item has 
no physical meaning in the present context'.  One good example of where 
defaults make sense is in  _atom_site_occupancy.  In a straightforward 
structure report this item may not be given, but it certainly does not 
imply that it is irrelevent or not known.  It would be assumed by any 
application to have the value of 1.0 unless otherwise stated.  A value 
of '.' for this item should not indicate the default - if the item is 
present in a CIF the value should be given explicitly even if it is the 
same as the default.  A value of '.' says that it makes no sense to talk 
about the occupancy of this atom (it might occur if the atom in question 
was a dummy atom, which is allowed).

>
>
>>
>> I suspect apart from Syd and I, almost no one sucks in dictionaries to
>> validate STAR/CIF file contents. Most just assume they know what they 
>> need
>> to and hope the definition of the data item has never changed.. 
>
There are two editor/browsers available and another that a couple of 
students are writing for me that do read in the dictionary before 
reading in the CIF and use the dictionary for validating.  The 
validations are not complete, but at least they test the important items 
such as enumeration lists, type etc.  It is a beginning, and I am trying 
(as an Acta editor) to educate users to preparing CIFs that will be 
accessible to the advanced software of the future.  However, most users 
still see CIF as just a more complicated file structure that offers 
little more than the old formatted output files produced by the 
principal structure-solving packages.

David

-- 
Dr. I.D.Brown, Professor Emeritus,
Department of Physics and Astronomy
McMaster University, Hamilton
Ontario, Canada




Reply to: [list | sender only]