[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: Fine-tuning CIF dictionary regexes
- Subject: RE: Fine-tuning CIF dictionary regexes
- From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
- Date: Mon, 18 Apr 2005 10:04:20 -0500
James Hester wrote: > The point I want to discuss boils down to: should the regular > expressions in the CIF dictionary be find-tuned to be > compatible not only with POSIX-compliant regular expression engines? It seems to me that it is desirable for the REs to be as general as possible. POSIX does have the advantage of being a formal (series of) standard(s). Perl-compatible REs, on the other-hand, have the advantage of widespread use, support, and acceptance, to the extent that I'd have to call them a defacto standard. POSIX compliance is attractive from the formal standards point of view, but Perl compatibility is more likely to be useful to software developers. If a particular RE in the dictionary must choose only one, then the Perl direction is the one I think I favor. > The following two constructs from mm_cif, although POSIX > compliant, will not correctly match in a Perl or Python or > Tcl regular expression (and any other NFA engine) > > floating point numbers: > > '-?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)?' > > symmetry operations > '([1-9]|[1-9][0-9]|1[0-8][0-9]|19[0-2])(_[1-9][1-9][1-9])?' > > The problem is that the non-POSIX engines will go through the > alternations (separated by |) in the above expressions from > left to right, returning the first match, and as the second > part is optional, there is no requirement to match it. In > contrast, a POSIX engine must return the longest match. So > e.g. if Python is fed the number 78.456(22), "78." will be > matched by the floating point expression, as this satisfies > the first part of the alternation, and everything else in the > regular expression is optional. Isn't it implied that the provided RE's must match an entire input token? As far as I can tell, that makes the (particular) distinction between RE semantics moot. Regards, John Bollinger -- John C. Bollinger, Ph.D. Indiana University Molecular Structure Center jobollin@indiana.edu _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Prev by Date: Re: Fine-tuning CIF dictionary regexes
- Next by Date: RE: Fine-tuning CIF dictionary regexes
- Prev by thread: Re: Fine-tuning CIF dictionary regexes
- Next by thread: RE: Fine-tuning CIF dictionary regexes
- Index(es):