[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revised draft of CIF 1.1 syntax document

Subject: Re: Revised draft of CIF 1.1 syntax document
From: Brian McMahon <bm@xxxxxxxx>
Date: Wed, 25 Sep 2002 10:12:16 +0100 (BST)

> Please understand that while it is appropriate to speak of lines in a CIF,
> it will cause a great deal of machine dependent trouble if we require the
> inclusion of a "newline" within the definition of a line.  The terminator
> of a line is a _very_ machine/system dependent concept.

I think this now understood and accepted by the contributors to this
discussion. However, you have identified the difficulty that, while unable to
specify a particular character of set of characters as a line terminator, we
must be able to handle lines as discrete entities according to whatever
implementation is appropriate for the current software environment.

>                                                ...   The obligation of a
> parser is to provide one empty line in the first case, ...

This is a helpful way to think about it. More generally, I see the obligation
of a parser to be to pass to a back-end application the information that the
application needs to work upon the contents of the file. So, when handling a
semicolon-delimited text field, the parser needs to pass on the information
that the "value" of the field is one or more character strings, each
considered a separate line of text. The backend application has the
responsibility, if an author so decides, of padding or truncating lines with
whitespace, of concatenating folded lines, or of trying to convert between
semicolon delimiters and quote delimiters: the parser does not need to worry
about that.

What Herbert is saying is that Fortran will lose trailing space on a line,
so a general statement about parsers must carry the caveat that "trailing
white space on a line may however be elided", as is already in para. 17 of
the syntax document. An *application* that depends on trailing whitespace
must be aware of this, and be designed in some way that it can guarantee
that the whitespace is properly handled. This may mean that the application
is restricted to a particular programming language, and/or uses a specific
parser that is attested to retain trailing whitespace.

The question then is: should the parser identify the last line of the input
text field as:
   (i)  a character string forming a last line of text
or
   (ii) a character string to be emitted without termination in the
        current output line ?

The argument boils down to: "(i) seems more natural and is how one would
read the description of these things in the STAR papers; but (ii) is the
way it has actually been implemented in CIFtbx and allows easier
transformation between semicolon and quote-delimited strings, particularly
if they don't extend over multiple lines".

I should certainly like to hear from Syd and Nick on this. It's another
small point of detail, but it's something from which we need to remove the
ambiguity.



>                                                        ...  It would be
> an equally valid approach to have a parser spew out the entire text field
> as a series of lines, but be warned that some valid imgCIF files may
> demand more memory from such a parser than it is likely to have available.

I'm not quite sure I understand this. Do you mean that the parser would read
the entire text field into memory and at the end of the process emit the
same byte-stream unchanged, leaving downstream applications to recognise
separate lines?

Brian

Reply to: [list | sender only]

Prev by Date: RE: CIF line folding/reassembly protocol

Next by Date: Re: Revised draft of CIF 1.1 syntax document

Prev by thread: Re: Revised draft of CIF 1.1 syntax document

Next by thread: Re: Revised draft of CIF 1.1 syntax document

Index(es):

Date

Thread

Discussion List Archives

Re: Revised draft of CIF 1.1 syntax document