[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. .
- From: James Hester <jamesrhester@gmail.com>
- Date: Tue, 11 Jan 2011 23:44:22 +1100
- In-Reply-To: <alpine.BSF.2.00.1101080754250.50478@epsilon.pair.com>
- References: <AANLkTimWpd1kMZDGcTprEhcJw+uQE4_JtgJ4SbtMPVXt@mail.gmail.com><263453.61027.qm@web87016.mail.ird.yahoo.com><alpine.BSF.2.00.1101070742350.2950@epsilon.pair.com><4D271B81.2050501@rcsb.rutgers.edu><a06240801c94d1ffe02d9@192.168.2.102><a06240803c94d2f76a2fd@192.168.2.102><8F77913624F7524AACD2A92EAF3BFA54166D7D1E9B@SJMEMXMBS11.stjude.sjcrh.local><AANLkTimWDSp1zOTLTapWErbiBYkkdUaSrUKj0B-ksqdb@mail.gmail.com><alpine.BSF.2.00.1101080754250.50478@epsilon.pair.com>
The "15 minutes" refers to the time taken to implement Simon's proposal, given software that already processes triple-quote delimited strings as per the current CIF2 standard. You can peruse the code that I have for lexing CIF2 files at: http://hg.berlios.de/repos/pycifrw/annotate/3e098b54c97d/pycifrw/YappsStarParser.nw Line 206 is the regular expression for triple-quoted strings. Matches to this regex have their triple quotes stripped and are then processed as any other datavalue, as you can see from line 347. Implementing Simon's proposal therefore only requires expanding the function "striptriple" to also remove <backslash><newline> characters and substituting <backslash> for <backslash><backslash>. In Python this would be two search-replace regexps, or two lines of code, thus the 15 minute estimate. Note that the exact mechanism of eliding backslashes requires a little fine-tuning, in my opinion, to restrict such elision to only those backslashes that might be relevant for eliding <newline>. There may be some small changes to the approach, but my point remains that whatever you do, implementing the complete Python behaviour is an order of magnitude more time and complexity. I'm not sure that this is particularly useful to those not using yacc/lex type tools, but I think my 15 minute estimate is reasonable in my case. I emphasise that the reason it is so quick to do is that Simon's proposal, and several others, do not require a change to the lexing logic; the eliding transformations can be carried out after lexing. Such magic is only possible for digraph or trigraph delimited strings. I do not understand Herbert's thinking about "flagging illegal data values" below. Neither I nor anybody else is suggesting that \b or any of the other Python escape sequences be banned from CIF2 strings. However, their interpretation as ASCII BEL or LaTeX boldface or poor man's Unicode character is left to the dictionary to decide. And finally, enough already with the dire predictions of doom. I have suggested that we continue until the end of this month searching for a solution, and if none is forthcoming, that we leave the currently approved standard as is. Ralf has made clear that he would be happy with a minimal solution, and Herbert has offered a constructive compromise. There are no signs of an impasse as yet in our discussions. Let us continue for a few weeks to see what we come up with. On Sun, Jan 9, 2011 at 12:16 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > First to the important part -- if James can do the complete > implementation of the treble quote parsing for CIF2 in 15 minutes, > then I respectfully request that he do so, and make the > code available to the rest of us as a template for all > to follow and understand. > > Second, what James sees as negative issues in 1 and 2, I see as postive > ones, especially for support of imgCIF, but in no case do I understand > what harm is done to any user or software developer by allowing the > greater generality of the python treble quote with the raw string > and the unicode string. The question seems to come down to how > early in the parse logic we will be required by the CIF2 standard > to issue warnings or error messages about "illegal" data values. > Are we to have a mandatory requirement for flsgging this at the > lexical level? Why? Some of us are going to have to allow for > the suppression on those errors and warnings to be able to process > our data (again imgCIF), but even for those who do not have such > a need, if the treble quote logic returns a string that contains > something "improper" (e.g. some of the disallowed uniocde values) > they can still report it. How is it different from someone who > is working with a unicode-aware editor who produces a single-quoted > version of one of those characters? > > It is unfortunate that we are having this discussion without Ralf. > As I feared, we now seem headed towards a decision by this body > that will require a full re-opened discussion at the COMCIFS level > and and will further delay CIF2, probably for another 3 years. > At least we should be able to take a shot at what we really need, > a face-to-face meeting, in Madrid. > > It is a shame. Ralf really is right. > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Sat, 8 Jan 2011, James Hester wrote: > >> Perhaps I was unclear as to why I am not satisfied with Ralf's >> proposal. I object because: >> >> (1) It defines a large number of unnecessary escapes (I listed 10), >> some of which are not allowed CIF characters; >> (2) It defines both raw and unicode strings, which is excessive for >> our requirements >> (3) The sequences <backslash><quote> and <backslash><apostrophe> are >> ambiguous in raw strings: are they elide sequences, or are they >> intended for the string consumer? >> >> Perhaps the supporters of the Python approach would like to explain >> why these objections are immaterial, especially given that there are >> already about 6 significantly simpler proposals on the table to which >> these objections do not apply. >> >> I do not perceive any advantage in adopting the Python approach >> wholesale. For example, Simon's minimalist suggestion would be much >> easier to implement, interpret and document than the complete Python >> scheme - I estimate about 15 minutes of coding time. >> >> On Sat, Jan 8, 2011 at 9:32 AM, Bollinger, John C >> <John.Bollinger@stjude.org> wrote: >>> >>> On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote: >>> >>>> We seem not to be communicating effectively. >>>> >>>> What I am asking for is an _existing_, supported treble quote >>>> specification >>>> from an _existing_ language with _existing_ documentation and >>>> _existing_ software as an alternative to the Python specification, >>>> documentation and software to which we all have access, that is being >>>> proposed as an alternative >>>> to what Ralf has proposed. >>> >>> Thank you for that clarification. You are right, I didn't understand >>> what you were asking for. >>> >>> I hope this will likewise clarify my position: I reject the premise that >>> the system we choose must meet those criteria, and I oppose adopting the >>> full Python syntax and semantics. >>> >>>> The Python specification is available at >>>> >>>> http://docs.python.org/reference/index.html >>>> >>>> with the lexical analysis at >>>> >>>> http://docs.python.org/reference/lexical_analysis.html >>> >>> Thanks, though that is exactly what I was looking at already. It leaves >>> several details unclear, some of which I discussed in previous messages. >>> Hence, I consider it slightly short of a *full* specification. It does, >>> however, provide my grounds for opposing adoption of that scheme for CIF. >>> >>>> The complete source code and binaries are available at: >>> >>> Unless you propose to append a particular set of Python sources to the >>> CIF specification as a reference, I have no interest in perusing the source >>> code to seek answers to such questions of detail as I have. Furthermore, I >>> would oppose adding such an appendix on the grounds that it would be >>> exceedingly difficult to use to resolve questions such as mine. >>> >>> I am likewise unwilling to rely on the behavior the python binary that >>> happens to be installed on my computer to answer them. If the correct >>> behavior is not documented independent of the program then there is no >>> particular reason to trust that it won't change in future versions, or that >>> any particular implementation is correct or bug-free. >>> >>> >>> Regards, >>> >>> John >>> >>> -- >>> John C. Bollinger, Ph.D. >>> Department of Structural Biology >>> St. Jude Children's Research Hospital >>> >>> >>> >>> >>> Email Disclaimer: www.stjude.org/emaildisclaimer >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (James Hester)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (SIMON WESTRIP)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (Herbert J. Bernstein)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D (John Westbrook)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. . (Bollinger, John C)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. . (James Hester)
- Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. .. .
- Next by Date: [ddlm-group] Simon's elide proposal
- Prev by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. .. .
- Next by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Index(es):