[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- From: John Westbrook <jwest@rcsb.rutgers.edu>
- Date: Fri, 07 Jan 2011 08:56:17 -0500
- In-Reply-To: <alpine.BSF.2.00.1101070742350.2950@epsilon.pair.com>
- References: <AANLkTimWpd1kMZDGcTprEhcJw+uQE4_JtgJ4SbtMPVXt@mail.gmail.com> <263453.61027.qm@web87016.mail.ird.yahoo.com><alpine.BSF.2.00.1101070742350.2950@epsilon.pair.com>
I have been quiet on this issue as my bias for supporting Python semantics has not been popular or productive in prior DDLm/Cif2 discussions. I would extend Herb's argument to the whole of this enterprise and emphasize my view that meaningful adoption of DDLm/CIF2 will require embracing and leveraging existing technologies as much as possible. John On 1/7/11 7:52 AM, Herbert J. Bernstein wrote: > As noted in my prior message, I disagree. I find it > counter-inutitive and unproductive to adopt something > that looks very much like the python treble quoted > string but which follows confusingly different rules. > Remeber -- for most of the the coomunity, the entire > CIF2 approach to quoting is something new and different. > It does not agree with the well-established CIF1 quoting > rules. By giving them the python treble quoted strings > we are giving them a way to simply and easily carry any > and all strings and text fields forward from CIF1 to CIF2 > without having to seriously rework them. Sure, we could > come up with some other set of rules for treble quoted > strings, but by following the python rules we will > greatly reduce the chances of misinterpretations in > the marginal cases, and give ourselves an independent > check on our new parsers -- all the existing oython > parsers. > > I believe that Ralf is right. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Fri, 7 Jan 2011, SIMON WESTRIP wrote: > >> Dear All >> >> My initial reaction to the adoption of the python mechanism for >> tripple-quoted strings >> was that it is counter-intuitive in a CIF context - i.e. you might expect >> the base syntax of >> ''' and """ delimiited strings to be the same as that of the other delimeted >> strings, which in >> CIF1 and the proposed CIF2 is closer to python's 'raw' strings. >> >> However, I am in favour of revisiting the issue to address the restrictions >> of the current set of >> delimiters, and believe that there may indeed be an answer amoungst James's >> proposals, which >> could be agreed upon quite swiftly, both respecting the lagacy of CIF1 and >> rectifying its shortcomings in >> this respect. >> >> I will follow up on this when I have considered James's proposals in more >> detail. >> >> I'd rather the group spent a little more time on this than just 'dumping' a >> bit of python syntax into CIF. >> >> Cheers >> >> Simon >> >> >> ____________________________________________________________________________ >> From: James Hester <jamesrhester@gmail.com> >> To: ddlm-group <ddlm-group@iucr.org> >> Sent: Friday, 7 January, 2011 4:46:10 >> Subject: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D >> >> Dear DDLm group members, >> >> Most of you will be aware that the CIF2 standard has been approved by >> COMCIFS, with one dissenting vote. I propose to revisit the point >> raised by Ralf in his dissenting vote, in order to see if we can't >> improve this aspect of the standard. The particular problem >> identified by Ralf, and this problem exists to a more limited extent >> with CIF1 as well, is that there is no mechanism to elide instances of >> the string delimiter sequence, meaning that certain pathological >> strings cannot be included in a CIF2 file. A further issue is that >> CIF writing programs have to run through a long series of checks when >> determining how to delimit any given string. I propose that we revisit >> this problem, with the restriction proposed by Ralf that we consider >> only triple quote/triple apostrophe delimited strings. >> >> To get us back up to speed on this issue, you will recall some salient >> points from previous discussions, which taken together led to our >> failure to make any progress: >> >> (1) CIF files are often edited in text editors. Working with CIF text >> in a text editor should not produce unexpected behaviour for a typical >> workflow. >> (2) CIF text may include LaTeX or other marked-up text, which will be >> cumbersome to insert in the file if it contains many instances of >> elide characters (see point (1)) >> (3) IUCr "markup" for Greek letters uses backslash to introduce the >> special character combination >> (4) Any characters that function as elides must be removed from the >> string at parse time to avoid ambiguity in interpretation when >> returned to the calling application >> >> If we limit ourselves to triple quote/apostrophe delimited strings, as >> Ralf proposes, then we can construct an elide scheme that is invisible >> to the lexer, by simply breaking the trigraph appropriately. I >> propose the following general scheme, where <delimiter> refers to one >> delimiter character, so the full string delimiter would be >> <delimiter><delimiter><delimiter>: >> >> Proposal C: >> >> When reconstructing the datavalue from an input triple-<delimiter> >> delimited string, the following simple transformation is performed: >> all occurrences of <delimiter><elide> are replaced by <delimiter>. >> >> My comments on this scheme are as follows: >> (0) When preparing a string for output, any occurrences of >> <delimiter><elide> *must* be replaced by <delimiter><elide><elide>; >> <delimiter> only needs to be elided when necessary to break up triple >> <delimiter> sequences in the source string, and when the final >> character of a string is <delimiter> >> (1) It is invisible to the lexer, which will correctly find the string >> terminator characters without knowledge of the <elide> character used. >> (2) With appropriate choice of <elide>, there is a low likelihood of >> ever encountering a string where transformation needs to be performed, >> which means transforming the string is necessary only where three or >> more delimiter characters are present in a row, or the string >> concludes with a delimiter character. >> (3) The <elide> is a post-elide, by which I mean it elides the >> preceding character, not the next character. This is preferable to >> cover the case of an input string finishing with the <delimiter> >> character, in which case some non-<delimiter> character must appear >> after it to ensure the lexer does not consider the final <delimiter> >> character in the string as the first character of the terminating >> <delimiter><delimiter><delimiter> sequence. >> >> Finally, consider a general proposal D: >> >> Elided triple-<delimiter> strings are delimited by >> <char><delimiter><delimiter><delimiter>...<delimiter><delimiter><delimiter> >> . >> The initial <char> defines the character to use to post-elide the >> contents of the string as per proposal C. <char> would initially be >> any non-alphanumeric ASCII character, with the set expanded in the >> future to include Unicode characters once most applications were >> Unicode-aware. >> >> Examples (LHS is string as written in CIF file, RHS is actual >> datavalue inside angle brackets) >> >> &""" Bleg blah blah ""&" and so forth "&""" < >> Bleg blah blah """ and so forth"> >> $'''''$' AAABBB ''$' CCCDDD '$''' >> <''' AAABBB ''' CCCDDD '> >> >> This allows the string writer to choose the elide character to >> minimise <delimiter><elide> occurrences in the source text. Note that >> the need to choose and prepend a character to the string minimizes the >> likelihood that somebody will do a naive cut and paste. >> >> An even more general proposal would prepend a character to the string >> to indicate pre-elide (as per Proposal A in a separate email) or >> append a character to indicate post-elide. I don't propose to >> consider this. >> >> Again, please indicate your views on including any of these proposals >> in the CIF standard. >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. . (Bollinger, John C)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (SIMON WESTRIP)
- References:
- [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (James Hester)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (SIMON WESTRIP)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Next by Date: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Prev by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Next by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Index(es):