[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: John Westbrook <jwest@rcsb.rutgers.edu>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 10 Dec 2009 14:12:42 -0500
- Cc: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- In-Reply-To: <4B214507.8090703@rcsb.rutgers.edu>
- References: <20091209144035.GB29341@emerald.iucr.org><a06240801c74578ec8b59@[192.168.2.104]> <4B1FF3BB.8010601@niehs.nih.gov><4B2008B3.6090008@pdb-mail.rutgers.edu><a06240800c746ed907fcc@[192.168.2.104]><4B214507.8090703@rcsb.rutgers.edu>
Can we please forget the "non-negotiable" nonsense? We should start from the external user requirements and go where they lead us. We need to support validating existing data files with [] embedded tags by some workable mechanism. At 1:59 PM -0500 12/10/09, John Westbrook wrote: >Hi all -- > >To follow-up on this again. If we can support element level assignments in >data files then virtually all of our data item character set issues >will be sorted. >The treatment of the 0/1 and index ordering can be handled at the dictionary >level. I believe that all of this is incorporated in the DDL. > >I would curious if Simon or Brian can comment on the actual usage of >other potentially reserved characters that are currently among the >those that are in the nonnegotiable category. >Regards, > >John > > >Herbert J. Bernstein wrote: >>Dear Colleagues, >> >> One very neat resolution to this problem would be to allow a >>list or array-typed CIF2 tag to be referenced in a data file either >>as a whole or element by element. >> >> Thus >> >> _a.vec >> >>being defined as an array or list in CIF2 would automatically make >>the tags >> >> _a.vec[1] >> _a.vec[2] >>... >> >>defined CIF2 tags. If the array or list were nested, the >> >> _a.vec[1][1] >> _a.vec[1][2] >> >>etc. would be valid tags >> >> I would propose that this be general and automatic, applying to >>all tags defined as list or arrays. In view of past practice in >>CIF1, there is a slight conflict with respect to the default starting >>index in dREL versus the common CIF1 practice in indexing arrays >>from 0, but that can (and should be solved) with explicit specification >>of a starting index, so we can carry over the tag name usage from >>CIF1 without confusing people with an index shift. So, if _a.vec >>were an array of dimension 5, starting from index 0, _a.vec[0] >>through _a.vec[4] would be valid, but if the starting index were >>specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching >>CIF1 conventions. >> >> The aliasing mechanism might have to be extended or clarified to >>handle the mapping against CIF1 tags in bulk for _a.vec as a whole, >>but, to me, this has a very intuitive feel. >> >> Regards, >> Herbert >> >> >>At 3:29 PM -0500 12/9/09, John Westbrook wrote: >>>Hi all - >>> >>>On the issue of reserved characters in mmCIF/PDBx data items, these >>>generally have been inherited from the style of items from the core. The >>>majority of items in this class are data items related to short >>>matrices/tensors >>>and vectors (e.g. items including []). Virtually all have a syntax which >>>could reasonably be interpreted as a programmatic reference. For instance, >>> >>> >>>_atom_sites.fract_transf_matrix[1][1] 0.007738 >>>_atom_sites.fract_transf_matrix[1][2] 0.000000 >>>_atom_sites.fract_transf_matrix[1][3] 0.004298 >>>_atom_sites.fract_transf_matrix[2][1] 0.000000 >>>_atom_sites.fract_transf_matrix[2][2] 0.016545 >>>_atom_sites.fract_transf_matrix[2][3] 0.000000 >>>_atom_sites.fract_transf_matrix[3][1] 0.000000 >>>_atom_sites.fract_transf_matrix[3][2] 0.000000 >>>_atom_sites.fract_transf_matrix[3][3] 0.020200 >>>_atom_sites.fract_transf_vector[1] 0.00000 >>>_atom_sites.fract_transf_vector[2] 0.00000 >>>_atom_sites.fract_transf_vector[3] 0.00000 >>> >>>Are we close to being able to treat these as legal in the context >>>of CIF2/DDL+? >>>I suppose I am asking what will constitute a legal assignment for an element >>>of a matrix/array - >>> >>>Only this - >>> >>>_a.vec [1,2,3] >>> >>>or also expanded assignment by element such as - >>> >>>_a.vec[1] 1 >>>_a.vec[2] 2 >>>_a.vec[3] 3 >>> >>>If the latter is to be considered, then this will solve most of >>>the data name >>>issues for our data. >>> >>>Regards, >>> >>>John >>> >>>Joe Krahn wrote: >>>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 >>>> formatted file. The question is whether these files should be allowed as >>>> valid CIF2, or just for convenience as a non-standard CIF2. >>>> >>>> When CIF files are used as working data files, the restrictions should >>>> be relaxed. For long-term archival files, it makes sense to be more >>>> restrictive. I would just make the CIF1 names inaccessible to dREL. >>>> Alternatively, an implementation could allow CIF1 names only on reading, >>>> and require dictionary alias mappings to CIF2 names. >>>> >>>> One argument in favor of allowing them would be that someone wants to >>>> convert all data files to CIF2 format, but they want to preserve the >>>> original data as-is, without alias mapping. >>>> >>>> I think that the current CIF2 syntax makes it possible to use CIF1 names >>>> without any ambiguities. The question is whether they should be >>>> considered valid CIF2, or just a non-standard version that will be >>>> useful for the transitional period. >>> > >>>> Joe >>>> >>>> >>>> Herbert J. Bernstein wrote: >>>>> Personally, I would greatly prefer to allow all data names that do not >>>>> create a major lexer/parser conflict to appear in a data CIF and >>>>> only apply the strong restrictions to data names that appear in CIF2 >>>>> dictionaries as defined data names (not as aliases). -- Herbert >>>>> >>>>> >>>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>>>> I have one remaining niggle that I'd like to revisit before we put >>>>>> this finally to bed. As has been mentioned a couple of times >>>>>> recently, restricting the data-name character set does invalidate >>>>>> syntactically many existing CIF 1 files (e.g. >>>>>>_refine_ls_shift/esd_max ). >>>>>> We have discussed strategies for handling this, and I think these >>>>>> are workable strategies, but will involve investment and hence expense >>>>>> in workflow management in CIF archives. >>>>>> >>>>>> I understand the rationale behind this restriction is to simplify >>>>>> future processing of data names in areas such as dREL >>>>>> applications. The question really is whether we're choosing the right >>>>>> trade-off in making things cleaner at that end of the processing >>>>>> chain. I would suppose that a dREL or other application could ingest a >>>>>> data name with dangerous characters, convert it internally into a >>>>>> "safe" identifier that's used for all processing, and then restore the >>>>>> original form upon output; but writing that intermediate layer of >>>>>> processing is of course expensive (especially if there aren't readily >>>>>> available libraries that will do this transparently). >>>>>> >>>>>> I suspect that some of the original proposed syntactic changes also >>>>>> had the effect (whether by design or collaterally) of simplifying i/o, >>>>>> data structure management, symbol table processing etc., but those may >>>>>> have suffered in the subsequent revision exercise we've just been >>>>>> practising. Given the consensus we are now approaching, would the code >>>>>> builders now be prepared to incur the addition expense of handling >>>>>> "dangerous" data names? >>>>>> >>>>>> I really don't want to spark off a long discussion on this - if a >>>>>> quick round of response shows that there's no appetite to allow >>>>>> the additional punctuation characters in data names, I'll accept that >>>>>> gracefully. >>>>>> >>>>>> *** >>>>>> >>>>>> One last comment while I have the floor, though it is related in part >>>>>> to the above question. A concern raised in the editorial office was >>>>>> that there would be circumstances where users didn't know if they were >>>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>>>> to the vi editor - and we're imagining most of them are dealing with >>>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>>>> editorial offices would only want to use CIF2 seriously in association >>>>>> with DDLm dictionaries, and that we would expect the revised core >>>>>> dictionaries to use the dot component in data names to signal this >>>>>> further evolution. So even a superficial glimpse of the middle of a >>>>>> CIF would make it clear whether it was CIF1 or CIF2. >>>>>> >>>>>> Does that fit in with how others see this progressing? >>>>>> >>>>>> Cheers >>>>>> Brian >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> ddlm-group@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >>>-- >>>****************************************************************** >>> John Westbrook, Ph.D. >>> Rutgers, The State University of New Jersey >>> Department of Chemistry and Chemical Biology >>> 610 Taylor Road >>> Piscataway, NJ 08854-8087 >>> e-mail: jwest@rcsb.rutgers.edu >>> Ph: (732) 445-4290 Fax: (732) 445-4320 >>>****************************************************************** >>> >>>_______________________________________________ >>>ddlm-group mailing list >>>ddlm-group@iucr.org >>>http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Data-name character restrictions - one last time (Brian McMahon)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):