[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list owner]
[Reply to list (subscribers only)]
[Fwd: Powder CIF Proposals]
- To: Multiple recipients of list <pddmg@iucr.org>
- Subject: [Fwd: Powder CIF Proposals]
- From: "Brian H. Toby" <Brian.Toby@nist.gov>
- Date: Thu, 28 Sep 2000 19:03:11 +0100 (BST)
Attached is a message from Robin Shirley that responds to my questions and comments concerning his 1993 proposals. My previous message can be found in the archive at http://www.iucr.org/iucr-top/lists/pddmg/ I have not yet read this message carefully, but will post a reply at the latest next week. Your comments are invited as well. Brian ******************************************************************** Brian H. Toby, Ph.D. Leader, Crystallography Team Brian.Toby@NIST.gov NIST Center for Neutron Research, Stop 8562 voice: 301-975-4297 National Institute of Standards & Technology FAX: 301-921-9847 Gaithersburg, MD 20899-8562 http://www.ncnr.nist.gov/xtal ********************************************************************
- To: Brian.Toby@NIST.gov
- Subject: Re: Powder CIF Proposals
- From: "ROBIN SHIRLEY (USER)" <R.Shirley@surrey.ac.uk>
- Date: Thu, 28 Sep 2000 15:52:39 GMT
- CC: cif-developers@iucr.org
- Comments: Authenticated sender is <pss1rs@pop.surrey.ac.uk>
- Organization: Psychology Dept, Surrey Univ. U.K.
- Priority: normal
- Reply-to: R.Shirley@surrey.ac.uk
Brian and others Let me begin by saying how pleased I am to see these proposals at last being debated (I only received Brian's extract of February correspondence yesterday). They were originally submitted in 1993 at the time of the Beijing Congress, but then seemed to go missing and had to be resubmitted after Seattle, and so on. It has been frustrating to see how slowly such standardisation processes can move. This is not meant as a criticism of Brian but as a general comment. The background to my four proposals is that I am heavily involved in gathering the various mature indexing programs into an integrated suite (Crysfire) and moving this towards becoming an expert system for powder indexing. In the course of this I wished to make it possible for indexing programs to support Powder CIF standards. This has not yet really been the case, because of specific omissions from Powder CIF that reflect both a failure to take the needs of the indexing stage into account and which I think also involve some differences in underlying paradigm. My proposals were an attempt to address these needs, which I took up because I have worked on the indexing problem since 1967 and am probably as well placed as anyone to establish what would be needed. I think that a difference in underlying paradigm may exist because the principle on which CIF standards are built is that CIF files exist primarily to record measured data, plus various derived quantities that have been obtained from measured data by calculation. Of course such derived quantities also introduce an element of model-building and hence of hypothesis, but CIF definitions tend to downplay this aspect. The various quantities that are derived in the course of a structure analysis such as positional parameters, and to a greater extent thermal parameters and occupancies, reflect hypotheses concerning the nature of what makes up the average structure - hypotheses that may well not be unique. However the default expectation within CIF seems to be that, at each stage in a crystallographic investigation, both measured and derived quantities have unique values. Powder indexing moves further away from this assumption than does structure analysis, since it is inherently an inductive, multi-solution process - more so, for example, than the determination of the phases of reflections in direct methods, since these converge abruptly and convincingly to a definite set of values when the correct point in solution space is found, at which nearly all the calculated intensities agree with their measured values within the reasonable error bounds of the measurements. That is not the case with powder indexing, where the best that can be hoped for is that the favoured solution will have a sufficiently low set of obs-calc line-position differences to stand out from the number (often large) of other trial cells that also account for the set of measured line positions within their reasonable error bounds. Regarding people's responses to my four specific proposals: 00-2-11.1) _pd_proc_quadr_Q (or _pd_index_quad_Q - see discussion below) I accept that if this could be derived directly from _pd_peak_d_spacing, then the case for including it would be weak, although it is actually the preferred measure used by most indexing programs, being a linear function of the representation of the cell as powder constants (in which Q(A) = 1/a_squared, etc, often scaled up by 10000 for convenience). The objections raised in the discussion have clarified for me what is actually intended by this item, or rather what is *not* intended. At each stage during indexing, a particular set of observed Q values will be used, which often remains the same throughout the indexing process. However, these Q values *need not* be derived in the same way for each line. Because apparently minor deficiencies in the dataset can lead to disproportionately large increases in the time needed for successful indexing, experienced practitioners often scrutinise the data carefully first, and make adjustments to line positions on an individual basis, for example by substituting a combined estimate obtained from several runs which may well have been made under different experimental conditions (such as different wavelengths and/or instruments). Similarly, because the positions of individual low-angle lines are particularly important for some indexing methods and are often harder to measure accurately than those at higher angles, it may be beneficial to make adjustments based on lines at higher angles that appear to be their higher orders. Hence the Q values (or d-spacings, etc.) used during the indexing stage need not and often will not all be derived directly and consistently from _pd_peak_d_spacing, nor _pd_peak_2theta_centroid, nor any other single measure of line position. It remains important to record them, otherwise whatever expertise has gone into adjusting them will be lost and it will become impossible to reproduce the reported indexing results. That is in part why I originally proposed putting this item in the _pd_proc_ section rather than _pd_peak_. On reflection, I am now persuaded that, since these data are actually specific to the indexing stage, it might be preferable to move it to _pd_index_ so that it would become _pd_index_quadr_Q. An equivalent version might be _pd_index_d_spacing (I would prefer both to be defined, but would settle for either). 00-2-11.2) _pd_index_appendix Brian queried whether this might be accommodated within _pd_refln_. I would argue strongly for a distinct section for indexing (and now favour moving all indexing-specific items into it, such as quadr_Q and index_merit - see below). The sort of indexing history envisaged in my original proposal can now be captured and updated automatically in the form of the Crysfire logfile for that dataset - an example is attached. 00-2-11.3) _pd_proc_index_merit (or _pd_index_merit - see discussion below) Decisions concerning both this item and the next one need to reflect the potentially very large number of trial cells that can index a powder pattern within the reasonable error bounds of all its lines - typically several hundred and often more than 5000. Thus it can become unrewarding to capture many parameters of each possible trial cell on an individual basis, when sufficient basic information may already be present within a cumulative logfile stored under _pd_index_appendix (item 2 above). Brian has suggested that a formal definition of the figure of merit (FOM) would be required, but my experience in this field leads me to doubt whether this is yet practicable - I am more inclined to support Bob Von Dreele's suggestion to record instead the program that calculated the FOM, at least for Q-based FOM belonging to the De Wolff M20 family. This precaution should not be necessary for the Smith & Snyder FN measure, since that was well defined in their original 1979 paper (as four distinct components). However, in my judgement it is not primarily an indexing FOM (one designed to indicate the plausibility of a proposed *cell*) but rather one that indicates the quality of a measured *pattern* by reference to an *assumed correct cell*. Bob Snyder claims that FN is independent of crystal system, which is true in its role as a measure of data quality, but not when used as a measure of indexing success, since, for example, it does not favour more parsimonious 1-parameter cubic models over 6-parameter triclinic ones as I believe an indexing FOM should (Bob does not see it this way and after many debates we have agreed to differ!). Hence the four components of FN (N=number of observed lines used, F=the actual FOM, D=mean absolute difference in 2theta, and Nposs=number of possible calculated lines used) belong primarily with the pattern, and so in _pd_proc_ (although F can indeed also be used as a kind of indexing FOM). An additional reason is that FN is based on the more pattern-specific observable quantity 2theta, which is available directly only for angle-dispersive data, while M20-type FOM are based on less direct and more general Q values. It would then be defined as _pd_proc_FN N F D Nposs, where N, F, D and Nposs are as defined above. De Wolff's *original* M20 measure was also well-defined, but only if *all* lines were considered to be indexed, which will not be the case if any lines are excluded as "not indexed", since the criterion for this was not defined but left to the user. However, because of implementation difficulties, I know of no indexing program that actually uses De Wolff's original definition - not even Visser's ITO which originated in De Wolff's own lab. The implementation problem here is that De Wolff's original definition excluded any "unindexed" lines from the 20 observed lines that were used, so that, for example, a dataset with X20=2 unindexed lines would require *22* observed lines to provide the 20 observed *and indexed* lines needed for M20. This means that for a dataset containing less than 39 observed lines, the original M20 can be calculated for some trial cells but not for others, depending on whether or not their number X20 of excluded lines pushes the total required above the number of lines that have actually been observed. Thus *all* indexing programs that claim to calculate M20 actually use some variant of Visser's "M20" which is based on just the first 20 observed lines, so that, for example, if X20=3 then only 17 observed lines would be used rather than 20. Thus, although less rigorous, it needs only 20 observed lines where the original M20 would have needed 23. The bottom line here is that since all programs tackle such implementation issues a bit differently, it is desirable to record the program as well as the FOM measure. Thus I propose that three items are needed per FOM entry: first the value (M) then the name of the FOM ( usually M or M20), then the name of the program version concerned (*not* that of an attributed FOM author such as Snyder or Visser) which the Powder CIF standard obviously could not predefine. Thus this would become: _pd_index_merit M FOM program (e.g. _pd_index_merit 21.7 M20 ITO12, or _pd_index_merit 54.215 M1 CRYS934h). Some indexing programs calculate several FOM variants, so in theory this sub-section could contain more than one entry for a single trial cell. In principle I agree with Brian's suggestion that there could be a loop containing _pd_index_trialid (for consistency, _trialid not _cellid), ..trial_a, etc., but one needs to be aware, as discussed above, of the fact that this could easily contain thousands of entries, as is often the case with a Crysfire summary-file (see the example attached, which is for the same dataset as the example logfile), upon which it could indeed be based. If several trial cells were to be tabulated in a loop as suggested, then I think that all the item names should be prefixed _pd_index_trial_. (including the _merit entries). Other items that are useful in such summaries and which might also be included are: _pd_index_trial_Nobs, .._I20, .._volume, .._spgroup (space-group, as far as determined - usually just a Bravais lattice), .._date, .._time, .._pedigree (a program-specific audit code for tracing how that solution was arrived at). This list may well require updating in due course, since the field of powder indexing is still under active development. 00-2-11.4) _pd_peak_index_status (or _pd_index_peak_status - see discussion below) Since it would involve an entry for every line (perhaps 50-100) of every trial cell (there could be thousands), I think that Brian's option of looping this as a (cells x peaks) matrix is unattractive and probably should not be attempted. My intention was that this status list (like the list of trial hkl indices) would be maintained only for the single, current front runner among the trial cells. However, for consistency with the emerging section schema, I now think that the name should more logically become _pd_index_peak_status. In other words, that all items that are associated specifically with indexing operations should be gathered in the_pd_index_ section. That would make it easier for human readers to locate what is relevant to indexing, and also for automated CIF readers to include or exclude such material systematically. Such indexing status flags would most naturally be placed in the loop that contains the trial indices for that cell - I have not yet thought through the best way to cater for the common situation where the calculated pattern yields multiple indexings for a single observed line. Maybe such complications aren't worth bothering about, leaving the recorded information for that line confined to just the indices for the closest calculated line plus a "mult" index-status flag. With best wishes Robin Shirley --------------------------------------------------------------- > I have put my pdCIF hat back on again and have remembered that I > would like to get some comments from you on the pdCIF proposals > that you initiated a while back. I have attached a series of > e-mail messages related to these questions. Please get back to me > when you have the time. > Thanks, > Brian > ******************************************************************** > Brian H. Toby, Ph.D. Leader, Crystallography Team > Brian.Toby@NIST.gov NIST Center for Neutron Research, Stop 8562 > voice: 301-975-4297 National Institute of Standards & Technology > FAX: 301-921-9847 Gaithersburg, MD 20899-8562 http://www.ncnr.nist.gov/xtal > ********************************************************************
[Send comment to list owner]
[Reply to list (subscribers only)]
- Prev by Date: re: 00-2-11 Proposed additions to the pdCIF dictionary
- Next by Date: Powder CIF Proposals for Indexing
- Prev by thread: Re: Powder CIF Proposals for Indexing
- Next by thread: 00-2-11 Proposed additions to the pdCIF dictionary
- Index(es):