[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: "Dr P. Murray-Rust" <pm286@cam.ac.uk>
- Date: Thu, 19 Aug 2004 14:14:49 -0000
- In-Reply-To: <f06100502bd4a5da7564a@[192.168.2.100]>
- References: <Pine.LNX.4.44.0408181322570.18193-100000@mostaccioli.csse.uwa.edu.au><f06100502bd4a5da7564a@[192.168.2.100]>
On Aug 19 2004, Herbert J. Bernstein wrote: > At 2:26 PM +0100 8/19/04, Dr P. Murray-Rust wrote: > >On Aug 19 2004, Herbert J. Bernstein wrote: > > > ... > > >The difficulty is not pserving the data type, but the semantics of > >downstream decisions. If one author writes _my_phone "123-45678" > >they are announcing this is not a number while if another writes > >_my_phone 123-45678 they are announcing it is a number. The > >discussion so far seems to suggest that these statements overrule > >the datatypes specified in the dictionary entries. There is a > >particular problem in loop_s, where it is then possible to have > >different data types within a column: > > > >loop_ _atom_site_occupancy > >1.0 > >0.3 > >"not refined" > >"0.3" > >"." > > > >which makes the implementation very difficult. I believe that a > >programmer should be able to look up the data type in the dictionary > >entry and write a routine that relies on a value being of the > >correct data type and throws an exception if not. > > > > If there is a dictionary, so the type is known, there are no downstream > decisions to be made. If the data type is numeric, the non-numeric > strings are an error. Good. This makes things much easier. If the data type is a character type, all the data > values are valid. Again no problem. If there is no dictionary, then the parser designer has > to make some context-sensitive typing decisions. The choice in CIFtbx is > to infer the typing from the first instance of the data. Other choices > could be made, including posponing the typing decision until an entire > column is read, but whatever the decision, once it is made, the right > thing to do is to report to the user conflicts between the type of the > data and the type chosen for the tag. I understand the logic of this. It is probably manageable if there are only char and numb - but becomes impossible if there are many. I am happy to go along with any interpretation as long as it's general across the community. I understand your proposal as: Author: - if it's quoted its a char. (Note there are some strings that have to be quoted but they can only be chars anyway) - it it's not quoted no datatype is stated. Reader: - if there is a dictionary the type is defined by that: -if the dictType is a char, no problem - if dictType = numb, and authorType is char, then error - if dictType = numb and authorType is not stated, try to decode as numb -if impossible, throw an error - if there is no dictType -if an item, try to decode as numb; if successful treat as numb else char - if in a loop_ use this logic to decide data type of first value - if all types are numb , decide the column is a numb - if any types cannot be decoded as numb, make all of them chars - never throw any dataType errors I can live with this (as I expect that many authors will make up their own data types without dictionaries). However I think this (and other recent discussions need formalising in the spec. It is unlikely that implementers will work this out consistently! P. It is a bit like the problem of > working with an XML dataset without the DTD. You have to guess a bit on > what is legal where, and sometimes you guess wrong. Yes, but XML only has one dataType (string) if a DTD is not provided. It is best to have > the dictionaries in CIF just as it is best to have DTDs or schema in XML. I agree. I think it's almost essential. P. > -- Herbert >
Reply to: [list | sender only]
- References:
- Re: CIF Infoset (Nick Spadaccini)
- Re: CIF Infoset (Herbert J. Bernstein)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Provence and property rights
- Index(es):