[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Fine-tuning CIF dictionary regexes
- Subject: Fine-tuning CIF dictionary regexes
- From: James Hester <jrh@xxxxxxxxxxxx>
- Date: Mon, 18 Apr 2005 14:11:46 +0900
The point I want to discuss boils down to: should the regular expressions in the CIF dictionary be find-tuned to be compatible not only with POSIX-compliant regular expression engines? The following two constructs from mm_cif, although POSIX compliant, will not correctly match in a Perl or Python or Tcl regular expression (and any other NFA engine) floating point numbers: '-?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)?' symmetry operations '([1-9]|[1-9][0-9]|1[0-8][0-9]|19[0-2])(_[1-9][1-9][1-9])?' The problem is that the non-POSIX engines will go through the alternations (separated by |) in the above expressions from left to right, returning the first match, and as the second part is optional, there is no requirement to match it. In contrast, a POSIX engine must return the longest match. So e.g. if Python is fed the number 78.456(22), "78." will be matched by the floating point expression, as this satisfies the first part of the alternation, and everything else in the regular expression is optional. One suggestion is that these two regular expressions are re-ordered so that those alternatives in an alternation which are a subset of other alternatives come later. This remains POSIX-compliant and means many non-POSIX engines will find the longest match. Does anyone read the dictionary-defined regexes directly into their program? James. _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Fine-tuning CIF dictionary regexes (Nick Spadaccini)
- Prev by Date: Re: bug reports requested
- Next by Date: Re: Fine-tuning CIF dictionary regexes
- Prev by thread: RE: _atom_site_aniso_label is broken
- Next by thread: Re: Fine-tuning CIF dictionary regexes
- Index(es):