[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 19 Aug 2004 09:49:00 -0400
- In-Reply-To: <Prayer.1.0.11.0408191426370.22833@hermes-1.csi.cam.ac.uk>
- References: <Pine.LNX.4.44.0408181322570.18193-100000@mostaccioli.csse.uwa.edu.au><Pine.BSF.4.58.0408190806350.10385@epsilon.pair.com><Prayer.1.0.11.0408191426370.22833@hermes-1.csi.cam.ac.uk>
At 2:26 PM +0100 8/19/04, Dr P. Murray-Rust wrote: >On Aug 19 2004, Herbert J. Bernstein wrote: > ... >The difficulty is not pserving the data type, but the semantics of >downstream decisions. If one author writes _my_phone "123-45678" >they are announcing this is not a number while if another writes >_my_phone 123-45678 they are announcing it is a number. The >discussion so far seems to suggest that these statements overrule >the datatypes specified in the dictionary entries. There is a >particular problem in loop_s, where it is then possible to have >different data types within a column: > >loop_ _atom_site_occupancy >1.0 >0.3 >"not refined" >"0.3" >"." > >which makes the implementation very difficult. I believe that a >programmer should be able to look up the data type in the dictionary >entry and write a routine that relies on a value being of the >correct data type and throws an exception if not. > If there is a dictionary, so the type is known, there are no downstream decisions to be made. If the data type is numeric, the non-numeric strings are an error. If the data type is a character type, all the data values are valid. If there is no dictionary, then the parser designer has to make some context-sensitive typing decisions. The choice in CIFtbx is to infer the typing from the first instance of the data. Other choices could be made, including posponing the typing decision until an entire column is read, but whatever the decision, once it is made, the right thing to do is to report to the user conflicts between the type of the data and the type chosen for the tag. It is a bit like the problem of working with an XML dataset without the DTD. You have to guess a bit on what is legal where, and sometimes you guess wrong. It is best to have the dictionaries in CIF just as it is best to have DTDs or schema in XML. -- Herbert -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu =====================================================
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF Infoset (Dr P. Murray-Rust)
- References:
- Re: CIF Infoset (Nick Spadaccini)
- Re: CIF Infoset (Herbert J. Bernstein)
- Re: CIF Infoset (Dr P. Murray-Rust)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):