[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: CIF Infoset
- From: "Dr P. Murray-Rust" <pm286@cam.ac.uk>
- Date: Thu, 19 Aug 2004 13:26:39 -0000
- In-Reply-To: <Pine.BSF.4.58.0408190806350.10385@epsilon.pair.com>
- References: <Pine.LNX.4.44.0408181322570.18193-100000@mostaccioli.csse.uwa.edu.au><Pine.BSF.4.58.0408190806350.10385@epsilon.pair.com>
On Aug 19 2004, Herbert J. Bernstein wrote: > There are two questions that Peter raises relative to comments and one > relative to data types that call for a very clear response > > > > > Q. If my CIF parser automatically strips all comments from the > > document and, say, deposists them in a public repopsitory, does anyone > > feel this is a problem? > > > This is not only a problem, but depending on who owns the illectual > property right in the document involved, it well may a violation of > copyright law. It is common practice to put copyright statements and > references to licenses in the comments of documents, whether they be in > CIF, XML or some other language. If you have created the document in > question, what you extract from it and deposit in a public repository is > your business. If the document was created by someone else, or you > surrendered your intellectual propoerty rights to someone else, they get > to decide how derived works are handled. So, if you are designing a CIF > parser to extract information from a CIF for some application to process > internally, stripping all comments may well be a good idea, but if you are > designing a CIF (or XML, or postscript, or ASN.1) or other parser to > reformat documents, then you need to be much more careful and inclusive > of comments. My own view is that comments should be preserved. Taking Herbert's view it then follows that comments are order-dependent. # Here is a list of authors # The first one is the lead author # A.B.Foo # D.E.Bar It also suggest that we should have a "comment block" (since comments cannot span more than one line However I think it would also be valuable to stress that any IPR, metadata, or other semantics are put in CIF items or loop_s and not in comments. I would prefer that authors are dissuaded from using comments for important information - that the > > > > > Q. Is the CIF version "comment" a special case and should it be > > preserved (I believe yes) > > The handling of the CIF magic number comments depends on what you are > doing with the document. If you are reading the document, it is a good > idea to read and parse the magic number to provide your parser with a hint > as to the intended syntax (e.g. 80 character vs. 2048 character line > length limit). If you are writing a document, then rather than preserving > the magic number comment from some starting document, you want to generate > your own magic number comment that corectly specifies the syntax > specification being followed by your CIF writer. The sensible practice > has been well established in the HTML/SGML/XML community, and proves very > helpful in dealing with the dizzying variety of HTML/SGML/XML syntax > versions. Hopefully we will never have as many co-existing syntax > versions in the CIF community, but the practice is still a sound one to > follow. > There is currently only one syntax for XML (V1.0), though XML1.1 is under devlopment. The XML declaration: <?xml version="1.0"?> is not mandatory but encouraged. I assume that the CIF magic comment is of that form and therefore not fundamentally a comment (the XML declaration is not a processing instruction). > > I am now unclear about the role of char and numb. I assumed they were > > for data validation and application programmers. The first would ensure > > that a data value was always a number - thus I would have believed that > > > > _cell_length_a 'too large to measure' > > > > was a validation error. The second aspect is now a nightmare for > > application programmers. Firstly the infoset (the result of the parse) > > has to retain knowledge of whether the value is quoted. Then the > > apllication has to take different action on whether the value is > > quoted. The author submits that _cell_length-a '12.1' _cell_length-a > > 12.1 have different meanings. (I cannot see what - as a programmer - I > > can or have to do). Formally if I get _cell_length-a '12.1' I would > > have to throw an exception "Cell_length_a is not a number, cannot > > continue". > > As do many langauges, CIF has data types. The number of types depends > on the DDL, but in all cases, there is a distinction between numeric data > and other, more string-oriented data types (e.g. char and text). Agreed. Just as > with most programming languages, a quoted "12" is not a number. The > application does not need to preserve the quotes, but it does need to > recognize that the data type of the data that it just read is not a > numeric data type, and if the context within which it is being used > calls for a numeric data type (e.g. as a value to _cell_length_a) then > a good parser really should inform the user of the conflict. This > does mean that the parser "has to take different action on whether the > value is quoted", but that is one of the services the parser is there > to perform for the user, if it can. Yes, there may be justification > for writing a light-weight parser that does not catch such errors, but > that hardly makes it a "nighmare for application programmers" to > write parser that do catch such errors. Even when a dictionary is not > being used, you really do want to recognize the distinction between > number and non-numeric data. For example, 1234-308 might well be > intended as the number 1234*10**(-308) while '1234-308' is clearly > intended to be the string of characters stated. The difficulty is not pserving the data type, but the semantics of downstream decisions. If one author writes _my_phone "123-45678" they are announcing this is not a number while if another writes _my_phone 123-45678 they are announcing it is a number. The discussion so far seems to suggest that these statements overrule the datatypes specified in the dictionary entries. There is a particular problem in loop_s, where it is then possible to have different data types within a column: loop_ _atom_site_occupancy 1.0 0.3 "not refined" "0.3" "." which makes the implementation very difficult. I believe that a programmer should be able to look up the data type in the dictionary entry and write a routine that relies on a value being of the correct data type and throws an exception if not. P.
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF Infoset (Herbert J. Bernstein)
- References:
- Re: CIF Infoset (Nick Spadaccini)
- Re: CIF Infoset (Herbert J. Bernstein)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Re: CIF Infoset
- Index(es):