[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: Fine-tuning CIF dictionary regexes
- Subject: RE: Fine-tuning CIF dictionary regexes
- From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
- Date: Thu, 16 Jun 2005 08:53:22 -0500
James Hester wrote: [...] > My question came up in connection with validating a CIF against a > dictionary: all I want is to be able to determine whether or > not a given string matches the regexp, so rather than > throwing a series of regexps at a string to get a token, I'm > throwing a string corresponding to a data item value at a > single regexp. I had hoped to be able to read the regexps > from the dictionary rather than hard code them. For your particular case, it seems that you ought to be able to read a regex from the dictionary, prepend a '^', append a '$', and go. Alternatively, some regex engines (e.g. Java's) allow you to exert control at the API level over whether or not the whole string, the beginning of the string, or just any old part of the string needs to match. > >> One suggestion is that these two regular expressions are > re-ordered > >> so that those alternatives in an alternation which are a subset of > >> other alternatives come later. This remains POSIX-compliant and > >> means many non-POSIX engines will find the longest match. > > > Are you sure you can order the rules such that it eliminates all > > instances of the problem you allude to? > > Not at all. However, such a reordering will increase the > number of regexp engines which will match the entire string. > POSIX correctness is maintained, so nothing is lost and > something (not necessarily all the > time) practical is gained in that Perl/Python/Tcl/? > programmers can automate type checking. To the extent it is feasible, I agree that it is useful to arrange the regexes so that they exhibit favorable behavior in the widest possible range of regex engines. Some standard needed to be chosen to unambiguously establish the meaning of the regexes, however, and it may not be possible to arrange all the regexes so that they have the same meaning to regex engines that do not conform to the chosen standard (POSIX). One could document how the regexes used in the dictionary are affected by the different regex semantics of some other engine(s) (e.g. Perl's), and that might be useful, but one cannot write a generic document of that nature. -- John C. Bollinger, Ph.D. Indiana University Molecular Structure Center jobollin@indiana.edu _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- RE: Fine-tuning CIF dictionary regexes (James Hester)
- Prev by Date: Re: Fine-tuning CIF dictionary regexes
- Next by Date: RE: Fine-tuning CIF dictionary regexes
- Prev by thread: Re: Fine-tuning CIF dictionary regexes
- Next by thread: RE: Fine-tuning CIF dictionary regexes
- Index(es):