[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: Nick Spadaccini <nick@cs.uwa.edu.au>
- Date: Wed, 18 Aug 2004 14:10:40 +0800 (WST)
- In-Reply-To: <5.1.1.6.0.20040817161113.03d77fd8@pop.hermes.cam.ac.uk>
On Tue, 17 Aug 2004, Peter Murray-Rust wrote: > Q. Are comments part of the infoset? My current belief is no, but > certain comments (e.g. #\\#CIF_1.1. convey important information. Also > some comments such as > > # Supplementary Material (ESI) for Organic & Biomolecular Chemistry # > This journal is © The Royal Society of Chemistry 2003 > > may suffer by being lost This question is more interesting than any answer. If infosets define lexically equivalent files why ask this question? If there is a comment in the file, then there should exist an infoset that can handle it - isn't that the idea? Whether at an application level one chooses to use the comments is a different question. StarBase (an application) *chooses* to interpret comments as lexical whitespace and removes them in the tokenising phase. Does an infoset for HTML that says <b><!--interpret hello as goodbye-->hello</b> is equivalent to <b>hello</b>? If so, wouldn't that be somewhat dangerous? > Q. Does the presence or absence of a dictionary affect the infoset? (it > is formally impossible to deconvolute namespaces or categories without a > dictionary) Moreover defaults, etc (see below) depend on a dictionary. Why is the deconvolution of namepsaces and categories (in the Star syntax) a lexical issue? That is a higher order issue. The datanames would have to be identical (up to case) in either file, though their placement could be very different. > the presence of a dictionary is important, is it an error to have a CIF > without a dictionary? The lexical level I am trying to see how you need a dictionary. If it is a question of a value like "?" versus a another file with the default value substituted then these are very different things, and the infoset should highlight them as such. > Q Should the (a) fact (b) manner of quoting be preserved in the infoset? > The specification suggests that '12' and 12 should be interpreted > differently in certain circumstances, but I cannot work out which and > how. (The type of a data item is defined by the dictionary entry > char/numb - does the quoting overrule this? If not, what is its role?) This is a throwback from the very first versions of STAR. It was a weak attempt at some type information (only char and num - woefully inadequate). However it seems to me the declaration as char or numb had to do with its lexical appearance - not its actual type. So if something is numb, you expect it to be a number, irrespective of the lexical eye candy provided by a variety of delimited string forms. If _cell_length is declared numb, then '12.1' and 12.1 are equivalent in interpretation (at the application level). Mmmmmm. Now I can see why you think you need dictionaries. However if the above is what you are supposed to do with infosets the I have misunderstood what its intent is. I guess that infosets states the following to XML entities are lexically equivalent, <blah></blah> and <blah />, but this is a well defined operation - like order independence in STAR. I wouldn't *expect* an infoset to deal with the semantic equivalences of delimited versus non delimited strings. > Q Is the order of data items and loops in a data_ block unimportant? By definition. > Q is the order of names in a loop_ header important? Do At any single level yes, but not through a full nesting (STAR not a CIF issue). > Q Is the order of "rows" in a loop_ unimportant? Do Yes (in CIF). > have identical infosets? (In a relational model they would). > > Q Does data_global have any semantics? I suspect that formally it does > not, but it seems in widespread use: data_global doesn't exist. global_ does (in STAR and CIF?). Its semantics are well defined. > global_ > _foo foo > > data_a > _bar a > > data_b > _bar b > > seems to have the semantics equivalent to: > > data_a > _foo foo > _bar a > > data_b > _foo foo > _bar b Yes, furthermore global_ _foo foo data_a _bar a global_ _foo foo2 data_b _bar b seems to have the semantics equivalent to: data_a _foo foo _bar a data_b _foo foo2 _bar b > Q how should ? be treated in the infoset? Strictly it should be treated as ? at the lexical level ie TOKEN(UNKNOWN). What you do with that at the higher level may require the dictionary. Similarly (at a lextical level) "." should be left as it is. It is up to the application to deal with it. > Q how is '.' to be interpreted? Again (I believe) an application level problem, not to be handled at a lexical level. > This is extremely difficult to interpret in the infoset. The first part > suggests that the limitations come from a non-rectangular loop_ - it is > simply there so the syntax is not violated. The default value cannot be > applied without a program that understands and implements dictionary > entries. How common is this? (I suspect fairly rare.) If so, I would > argue that the default approach is dangerous and be phased out. I suspect apart from Syd and I, almost no one sucks in dictionaries to validate STAR/CIF file contents. Most just assume they know what they need to and hope the definition of the data item has never changed.. Good luck, Peter. cheers Nick -------------------------------- Dr N. Spadaccini Head of School School of Computer Science & voice: +(61 8) 6488 3452 Software Engineering fax: +(61 8) 6488 1089 The University of Western Australia email: nick@csse.uwa.edu.au 35 Stirling Highway w3: www.csse.uwa.edu.au/~nick CRAWLEY, Perth, WA 6009 AUSTRALIA CRICOS Provider Code: 00126G
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF Infoset (Dr P. Murray-Rust)
- References:
- CIF Infoset (Peter Murray-Rust)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):