[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Revised draft of CIF 1.1 syntax document
- Subject: Re: Revised draft of CIF 1.1 syntax document
- From: Brian McMahon <bm@xxxxxxxx>
- Date: Wed, 25 Sep 2002 10:12:16 +0100 (BST)
> Please understand that while it is appropriate to speak of lines in a CIF, > it will cause a great deal of machine dependent trouble if we require the > inclusion of a "newline" within the definition of a line. The terminator > of a line is a _very_ machine/system dependent concept. I think this now understood and accepted by the contributors to this discussion. However, you have identified the difficulty that, while unable to specify a particular character of set of characters as a line terminator, we must be able to handle lines as discrete entities according to whatever implementation is appropriate for the current software environment. > ... The obligation of a > parser is to provide one empty line in the first case, ... This is a helpful way to think about it. More generally, I see the obligation of a parser to be to pass to a back-end application the information that the application needs to work upon the contents of the file. So, when handling a semicolon-delimited text field, the parser needs to pass on the information that the "value" of the field is one or more character strings, each considered a separate line of text. The backend application has the responsibility, if an author so decides, of padding or truncating lines with whitespace, of concatenating folded lines, or of trying to convert between semicolon delimiters and quote delimiters: the parser does not need to worry about that. What Herbert is saying is that Fortran will lose trailing space on a line, so a general statement about parsers must carry the caveat that "trailing white space on a line may however be elided", as is already in para. 17 of the syntax document. An *application* that depends on trailing whitespace must be aware of this, and be designed in some way that it can guarantee that the whitespace is properly handled. This may mean that the application is restricted to a particular programming language, and/or uses a specific parser that is attested to retain trailing whitespace. The question then is: should the parser identify the last line of the input text field as: (i) a character string forming a last line of text or (ii) a character string to be emitted without termination in the current output line ? The argument boils down to: "(i) seems more natural and is how one would read the description of these things in the STAR papers; but (ii) is the way it has actually been implemented in CIFtbx and allows easier transformation between semicolon and quote-delimited strings, particularly if they don't extend over multiple lines". I should certainly like to hear from Syd and Nick on this. It's another small point of detail, but it's something from which we need to remove the ambiguity. > ... It would be > an equally valid approach to have a parser spew out the entire text field > as a series of lines, but be warned that some valid imgCIF files may > demand more memory from such a parser than it is likely to have available. I'm not quite sure I understand this. Do you mean that the parser would read the entire text field into memory and at the end of the process emit the same byte-stream unchanged, leaving downstream applications to recognise separate lines? Brian
Reply to: [list | sender only]
- Prev by Date: RE: CIF line folding/reassembly protocol
- Next by Date: Re: Revised draft of CIF 1.1 syntax document
- Prev by thread: Re: Revised draft of CIF 1.1 syntax document
- Next by thread: Re: Revised draft of CIF 1.1 syntax document
- Index(es):