[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: comcifs@iucr.org
- Subject: Re: CIF Infoset
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 19 Aug 2004 08:41:00 -0400 (EDT)
- In-Reply-To: <Prayer.1.0.11.0408191029190.1900@hermes-1.csi.cam.ac.uk>
- References: <Pine.LNX.4.44.0408181322570.18193-100000@mostaccioli.csse.uwa.edu.au><412380ED.5050006@mcmaster.ca><Prayer.1.0.11.0408191029190.1900@hermes-1.csi.cam.ac.uk>
There are two questions that Peter raises relative to comments and one relative to data types that call for a very clear response > > Q. If my CIF parser automatically strips all comments from the document > and, say, deposists them in a public repopsitory, does anyone feel this is > a problem? This is not only a problem, but depending on who owns the illectual property right in the document involved, it well may a violation of copyright law. It is common practice to put copyright statements and references to licenses in the comments of documents, whether they be in CIF, XML or some other language. If you have created the document in question, what you extract from it and deposit in a public repository is your business. If the document was created by someone else, or you surrendered your intellectual propoerty rights to someone else, they get to decide how derived works are handled. So, if you are designing a CIF parser to extract information from a CIF for some application to process internally, stripping all comments may well be a good idea, but if you are designing a CIF (or XML, or postscript, or ASN.1) or other parser to reformat documents, then you need to be much more careful and inclusive of comments. > > Q. Is the CIF version "comment" a special case and should it be preserved > (I believe yes) The handling of the CIF magic number comments depends on what you are doing with the document. If you are reading the document, it is a good idea to read and parse the magic number to provide your parser with a hint as to the intended syntax (e.g. 80 character vs. 2048 character line length limit). If you are writing a document, then rather than preserving the magic number comment from some starting document, you want to generate your own magic number comment that corectly specifies the syntax specification being followed by your CIF writer. The sensible practice has been well established in the HTML/SGML/XML community, and proves very helpful in dealing with the dizzying variety of HTML/SGML/XML syntax versions. Hopefully we will never have as many co-existing syntax versions in the CIF community, but the practice is still a sound one to follow. > I am now unclear about the role of char and numb. I assumed they were for > data validation and application programmers. The first would ensure that a > data value was always a number - thus I would have believed that > > _cell_length_a 'too large to measure' > > was a validation error. The second aspect is now a nightmare for > application programmers. Firstly the infoset (the result of the parse) has > to retain knowledge of whether the value is quoted. Then the apllication > has to take different action on whether the value is quoted. The author > submits that _cell_length-a '12.1' _cell_length-a 12.1 have different > meanings. (I cannot see what - as a programmer - I can or have to do). > Formally if I get _cell_length-a '12.1' I would have to throw an exception > "Cell_length_a is not a number, cannot continue". As do many langauges, CIF has data types. The number of types depends on the DDL, but in all cases, there is a distinction between numeric data and other, more string-oriented data types (e.g. char and text). Just as with most programming languages, a quoted "12" is not a number. The application does not need to preserve the quotes, but it does need to recognize that the data type of the data that it just read is not a numeric data type, and if the context within which it is being used calls for a numeric data type (e.g. as a value to _cell_length_a) then a good parser really should inform the user of the conflict. This does mean that the parser "has to take different action on whether the value is quoted", but that is one of the services the parser is there to perform for the user, if it can. Yes, there may be justification for writing a light-weight parser that does not catch such errors, but that hardly makes it a "nighmare for application programmers" to write parser that do catch such errors. Even when a dictionary is not being used, you really do want to recognize the distinction between number and non-numeric data. For example, 1234-308 might well be intended as the number 1234*10**(-308) while '1234-308' is clearly intended to be the string of characters stated. -- Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 19 Aug 2004, Dr P. Murray-Rust wrote: > On Aug 18 2004, David Brown wrote: > > > I am also finding this interchange interesting. > > Thanks - it is a deep issue and resulted many thousands of emails in the > XML community. > > The issue as I see it is whether CIFs are seen as machine-understandable > documents or whether they are primarily to produce material for humans to > read. (They can do both, but it requires work). ...
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF Infoset (Dr P. Murray-Rust)
- References:
- Re: CIF Infoset (Nick Spadaccini)
- Re: CIF Infoset (David Brown)
- Re: CIF Infoset (Dr P. Murray-Rust)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):