[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Fine-tuning CIF dictionary regexes
- Subject: Re: Fine-tuning CIF dictionary regexes
- From: James Hester <jrh@xxxxxxxxxxxx>
- Date: Thu, 16 Jun 2005 17:05:15 +0900
On Mon Apr 18th Nick wrote: > POSIX compliance makes sure you exhaust the input string until you find > the longest matching sequence. This is necessary to get the "correct" > token. I understand it as "leftmost, longest" so that the regexp engine must search through all alternative matches to find the longest. > But if you are throwing the "number" to a series of compiled regular > expressions won't 78.456(22) also match '7', an integer, and return the > INT token? If that happens to be the first rule it comes across? My question came up in connection with validating a CIF against a dictionary: all I want is to be able to determine whether or not a given string matches the regexp, so rather than throwing a series of regexps at a string to get a token, I'm throwing a string corresponding to a data item value at a single regexp. I had hoped to be able to read the regexps from the dictionary rather than hard code them. (As an aside, I have split CIF processing into syntax and validation, so that no tokenisation in terms of INT/FLOAT/NUMBER etc. happens during syntax checking. All data values after the syntax stage are strings which are then inspected during validation). >> One suggestion is that these two regular expressions are re-ordered so >> that those alternatives in an alternation which are a subset of other >> alternatives come later. This remains POSIX-compliant and means many >> non-POSIX engines will find the longest match. > Are you sure you can order the rules such that it eliminates all instances > of the problem you allude to? Not at all. However, such a reordering will increase the number of regexp engines which will match the entire string. POSIX correctness is maintained, so nothing is lost and something (not necessarily all the time) practical is gained in that Perl/Python/Tcl/? programmers can automate type checking. (This reply is so late because I seem to have dropped off the mailing list and only noticed that some discussion had occurred when checking the archive later on). James. _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Fine-tuning CIF dictionary regexes (Herbert J. Bernstein)
- Prev by Date: RE: Fine-tuning CIF dictionary regexes
- Next by Date: Re: Fine-tuning CIF dictionary regexes
- Prev by thread: RE: Fine-tuning CIF dictionary regexes
- Next by thread: Re: Fine-tuning CIF dictionary regexes
- Index(es):