[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Variants
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Variants
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 25 Nov 2009 21:13:09 -0500 (EST)
- In-Reply-To: <279aad2a0911251733q26b0c3e2t3f7caf0ecfc48282@mail.gmail.com>
- References: <279aad2a0911251559q519f0460p4bbf7dd550268712@mail.gmail.com><279aad2a0911251733q26b0c3e2t3f7caf0ecfc48282@mail.gmail.com>
Dear Colleagues, Please look at what has been proposed. The proposal currently put forward by David is for 2 variants of the wavelength in the same CIF: > � � � � loop_ > � � � � � � _diffrn_radiation_wavelength_id > � � � � � � _diffrn_radiation_wavelength > � � � � � � _diffrn_radiation_wavelength_determinaton > � � � � � � � �1 � 1.23456 � fundamental > � � � � � � � �2 � 1.25 � � �estimated This is what was proposed by David based on Nick's suggestion: 2 wavelengths in one CIF data block. The defect in the proposal is that it is then clumsy to couple the two wavelengths to other values than change when the wavelength changes, such as cell dimensions. Use of variants helps to resolve that loose end. Now, to the substance of James' objections to variant: > (1) It seems to me that the closer a given CIF file is to the raw data, the > more useful recording of variants is, as the best path forward has not yet > been identified and so keeping different variations is useful; conversely, I beg to differ. Final, released PDB entries have alternate conformers and multiple versions of nomenclature. As a combination of noisy observations with uncertain models, our science tends to produce multiple equally valid results (look at any NMR entry), and variants are a clean way to deal with them all the way through an experiment, keeping track of what was done instead of losing it. > Taking the wavelength proposal as an example: once someone has refined a > wavelength from a standard material, what the nominal wavelength was is no > longer scientifically relevant, and so there is no reason to keep it and all > derived values in the file (which is why Nick's alternative wavelength > proposal is preferable, as only one wavelength is in the file). Certainly, in a perfect world with perfect science, the derivation of a wavelength with be a reliable, one-step process, but things go wrong, and different things get tried with different values used to derive other values (such as cells), and by failing to track and label things as the experiment progresses, we create the non-trivial risk of coupling the cell from one wavelength with the wavelength for another. Worse, in some cases, the choice of space group can flop around depending on refinements of wavelengths and cells. Yes, an author may decide to publish only one resulting final determination, but both the author and people trying to reproduce the results are likely to do better science if there is an option to preserve an audit trail of how the original author got to their "final" result. Someone else might decide to look at things differntly. I am not saying everyone has to use variants, any more than the PDB will tell and author they have to present alternate conformers for some fuzzy density, but it really is useful to have the option. > (2) Introducing variants means that multiple values for simple items such as > cell parameters could be present in a single datablock, and CIF reading > software must be rewritten to recognise which of those instances of cell > parameters it needs to care about (not to mention all those programs which > expect unlooped cell parameters...).� This is a very serious issue for small > molecule CIF, where many programs already exist.� I don't expect that this > is so serious for imgCIF, where (unfortunately) imgCIF applications are thin > on the ground. Inasmuch as what is proposed conforms to all existing CIF specifications, it is not the CIF reading software that would need to be rewritten, but the false assumption in some minds that there is only one right answer to a question that needs to be rewritten. It would be trivial to create a filter program to return one best variant (the one with the "preferred" role) if that was needed, but I would hope that at least some CIF users would not exercise such tunnel vision. > (3) What are our use cases for this change?� What is the motivation?� > Perhaps Herb could speak to this. I just did. > (4) Introduction of DDLm and dREL may change the variant scheme such that > only a limited set of variant values would need to be made available in any > CIF data file, and a dREL engine could then calculate out the corresponding > alternative derived values (and all combinations...).� But again, for > published data, we expect the author to have done this already and chosen > the best result. This is backwards. DDLm and dREL make it easier to manage precisely the relationships among variants, but you need to actually tag the variants to be able to use it. As for the last comment: > But again, for published data, we expect the author to have done this > already and chosen the best result. This is unrealistic and inconsistent with the ripply bottom was all have to deal with in non-linear least square refinements for model fitting, with the reality of NMR studies, and the reality of alternate conformers and micro-heterogeneity. My apologies if this seems a brusque reply -- charge it to the aches, pains and headache that comes with my broken nose, but try to keep an open mind on this one. It will get used by some people, inasmuch as it is part of the latest CBF dictionary, and even if people further downstream don't want to use it in their CIFS is would pay to understand it. I see an unfortunate divergence developing between what imgCIF/CBF users need and the way in which CIF itself it headed. If everyone else is OK with a somewhat different CIF dialect for images, I can live with that, but I hope there we can at least make clear definitions of the interfaces among dialects. Variants are part of that interface. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 26 Nov 2009, James Hester wrote: > I'll make some opening comments regarding the idea of variants: > > (1) It seems to me that the closer a given CIF file is to the raw data, the > more useful recording of variants is, as the best path forward has not yet > been identified and so keeping different variations is useful; conversely, > when publishing what is supposed to be the final ("correct") result, the > interest of the wider community will primarily be in this result, rather > than alternative results that are considered by the author to be inferior.� > Taking the wavelength proposal as an example: once someone has refined a > wavelength from a standard material, what the nominal wavelength was is no > longer scientifically relevant, and so there is no reason to keep it and all > derived values in the file (which is why Nick's alternative wavelength > proposal is preferable, as only one wavelength is in the file). > > (2) Introducing variants means that multiple values for simple items such as > cell parameters could be present in a single datablock, and CIF reading > software must be rewritten to recognise which of those instances of cell > parameters it needs to care about (not to mention all those programs which > expect unlooped cell parameters...).� This is a very serious issue for small > molecule CIF, where many programs already exist.� I don't expect that this > is so serious for imgCIF, where (unfortunately) imgCIF applications are thin > on the ground. > > (3) What are our use cases for this change?� What is the motivation?� > Perhaps Herb could speak to this. > > (4) Introduction of DDLm and dREL may change the variant scheme such that > only a limited set of variant values would need to be made available in any > CIF data file, and a dREL engine could then calculate out the corresponding > alternative derived values (and all combinations...).� But again, for > published data, we expect the author to have done this already and chosen > the best result. > > > On Thu, Nov 26, 2009 at 10:59 AM, James Hester <jamesrhester@gmail.com> > wrote: > I'm reposting Herbert's message in a new thread to aid > organisation.� Herbert wrote: > > ---- > Dear Colleagues, > > �While you are revisiting this item, I would suggest you > consider the more complete (and, I believe, more elegant and > general) solution of defining "variants", that we have > introduced into the imgCIF dictionary to handled quantities that > may be determined in different ways. > > �Instead of adding > > �_diffrn_radiation_wavelength_determination > > you would add > > �_diffrn_radiation_wavelength_variant > > and a new variant category > > � � � �_variant_variant > � � � �_variant_role > � � � �_variant_timestamp > � � � �_variant_variant_of > � � � �_variant_details > > which would allow you with complete generality to manage any number > a refined or redefined quantities, such as wavelengths. �This would > then allow you to us the same variant identifier, for, say cell > dimensions, which could be expected to change in a coupled manner > with the changes in wavelength. > > �If you are interested in this more complete approach, I can provide > you with the full item definitions, but the short form is: > > � � � �_variant_variant > > � � � � � � �The value of _variant_variant must uniquely identify > � � � � � � �each variant for the given diffraction experiment and/or > � � � � � � �entry > > � � � �_variant_role > > � � � � � � �The value of _variant_role �specifies a role > � � � � � � �for this variant. �Possible roles are null, "preferred", > � � � � � � �"raw data", and "unsuccessful trial". > > � � � �_variant_timestamp > > � � � � � � �The date and time identifying a variant. �This is not > � � � � � � �necessarily the precise time of the measurement or > � � � � � � �calculation of the individual related data items, but a > timestamp that > � � � � � � �reflects the order in which the variants were defined. > > � � � �_variant_variant_of > > � � � � � � �The value of _variant.variant_of gives the variant > � � � � � � �from which this variant was derived. �If this value is > not > � � � � � � �given, the variant is assumed to be derived from the > default > � � � � � � �null variant. > > � � � �_variant_details > > � � � � � � �A description of special aspects of the variant > > > An example of how this might be used is: > > � � � � loop_ > � � � � � � _diffrn_radiation_wavelength_id > � � � � � � _diffrn_radiation_wavelength > � � � � � � _diffrn_radiation_wavelength_determinaton > � � � � � � � �1 � 1.23456 � fundamental > � � � � � � � �2 � 1.25 � � �estimated > > > > would become > > � � � � �loop_ > � � � � � � �_diffrn_radiation_wavelength_variant > � � � � � � �_diffrn_radiation_wavelength > � � � � � � � � final � 1.23456 > � � � � � � � � pelim � 1.25 > � � � � �loop_ > � � � � � � �_variant_variant > � � � � � � �_variant_role > � � � � � � �_variant_timestamp > � � � � � � �_variant_variant_of > � � � � � � �_variant_details > � � � � � � �final preferred 2007-08-04T01:17:28 prelim refined > � � � � � � �prelim . � � � �2007-08-03T23:20:00 . . > > � � � � �loop_ > � � � � � � _cell_variant > � � � � � � _cell_length_a > � � � � � � _cell_length_b > � � � � � � _cell_length_c > � � � � � � _cell_angle_alpha > � � � � � � _cell_angle_beta > � � � � � � _cell_angle_gamma > � � � � � � final �22.5 22.5 22.5 90. 90. 90. > � � � � � � prelim 22.3 22.3 22.3 90. 90. 90. > > > �Regards, > � �Herbert > > ===================================================== > �Herbert J. Bernstein, Professor of Computer Science > � Dowling College, Kramer Science Center, KSC 121 > � � � �Idle Hour Blvd, Oakdale, NY, 11769 > > � � � � � � � � +1-631-244-3035 > � � � � � � � � yaya@dowling.edu > ===================================================== > > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > >
Reply to: [list | sender only]
- References:
- Variants (James Hester)
- Re: Variants (James Hester)
- Prev by Date: Re: Variants
- Next by Date: Re: Variants
- Prev by thread: Re: Variants
- Next by thread: Re: Variants
- Index(es):