[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
review of core CIF dictionary
- To: Multiple recipients of list <coredmg@iucr.org>
- Subject: review of core CIF dictionary
- From: "I. David Brown" <idbrown@mcmail.cis.mcmaster.ca>
- Date: Wed, 5 Jun 2002 16:42:48 +0100 (BST)
To members of the Core Dictionary Maintenance Group (DMG), First let me apologize for this long email. It introduces a proposal for a major review of the coreCIF dictionary which will need the help of all members of the core Dictionary Maintenance Group. You are either a member of this group or a valued consultant and we are anxious to hear your views and recommendations. Enclosed with this email is an agenda of items that we need to consider. In making this review we should remember that the original version of this dictionary was prepared when CIF was perceived as being nothing more than a format for transferring and archiving information about crystal structures. The first version of this dictionary was conceived as a printed guide for programmers and it was only later that it evolved into a computer-readable document in which the concept of categories with their keys and links gave the dictionary a relational form. Although changes have been made over the years to keep it in line with the changing perception of CIF, these have occurred in a piecemeal fashion. With major developments now underway that will convert CIF dictionaries into self-contained computer-accessible compendia of crystallographic knowledge, we need to carry out our review with an eye to the future and prepare the core dictionary so that it can take advantage of the changes that lie ahead. These will introduce computer-readable algorithms and include compact representations of vectors and tensors, but to be effective these changes will require a strict adherence to the dictionary definitions that give CIF its relational structure. These features have not yet been systematically developed or exploited by the Core dictionary. This review gives us the chance to prepare the dictionary to ease the transition to future versions. Over the past few years a number of changes have been suggested for the CIF core dictionary but only a few of them have so far been adopted. Some suggestions have come from individuals, some from Acta Cryst. which has discovered items that cannot be correctly given using the present version of the dictionary, and some from the Cambridge Crystallographic Data Centre which is working on ways to format the existing CSD entries in CIF. It is therefore a good time to undertake a comprehensive review of the CIF_core dictionary. If you have other suggestions for changes, now is the time to bring them forward. Brian has collected together all the suggestions that he has received so far, along with the discussion of these items that appeared on the Core DMG list server. This document can be viewed at: http://agate.iucr.org/cif/cif_core/revisions23.html The items in this document (marked with (w) in the agenda) are arranged in order of completeness - that is to say the first item is a simple correction of a definition that does not need DMG approval, then follow items that have already been approved, followed by those that are fully documented and should be simple to approve, all the way down to some general comments about problems with no particular suggestions as to how they might be resolved. As full details of the discussion so far are available on the website, I only summarize them below. The items on the agenda given below are arranged by category and, apart from item 1, are listed in the order they appear in the core dictionary. I have given each a number so that it can be easily referred to in the discussion. I have marked with an asterisk items that are relatively uncontroversial and could possibly be fast-tracked. Please post your comments to the core list server (coredmg@iucr.org). If you make a substantive comment on a particular topic, it is best to give it in its own email with the item number in the subject line. This will allow the list server to establish a thread that will make the discussion easier to follow. Before you send any comments, I recommend that you check the coredmg list server at http://www.iucr.org/iucr-top/lists/coredmg/. to see how this works. I would recommend that you download the file Brian has prepared on the agate web page quoted above, together with the current versions of the core and symmetry dictionaries. These are the documents that will be needed as we work on these revisions. Please give some thought to the following queries and suggestions and send your comments to the coredmg list server at the address given above. Suggestions for other changes are also welcome. Brian and I will organize your comments into dictionary format in order to focus the discussion and help us develop a consensus. I am sorry that we have such a long list of items to consider, but there is an advantage in reviewing all of these at the same time as it helps us to get a better overview and will probably reveal some other changes that are needed. I look forward to seeing your comments. Best wishes David Chair of the coreDMG Agenda given below ***************************************************** Dr.I.David Brown, Professor Emeritus Brockhouse Institute for Materials Research, McMaster University, Hamilton, Ontario, Canada Tel: 1-(905)-525-9140 ext 24710 Fax: 1-(905)-521-2773 idbrown@mcmaster.ca ***************************************************** AGENDA FOR CORE DICTIONARY REVIEW SUMMARY LIST OF ITEMS POSTED FOR CONSIDERATION. (Detailed descriptions follow) ----------------------------------------------------------- * Item that can probably be fast-tracked (w) Further details appear on CIF-core revisions at agate.iucr.org/cif/cif_core/revisions23.html. 1.(w) A general problem. 2. ATOM_SITE and ATOM_TYPE categories 2.1(w) Rigid groups 2.2(w) Representation of disorder 2.3 Scattering factors for multiple structure determinations 2.4 _atom_type_scat_versus_stol_list 2.5 _atom_site_fract_* 2.6 _atom_sites_special_details 2.7 Anharmonic atomic displacement parameters 2.8 _atom_site_refinement flags 4. CELL category 4.1*(w) Reciprocal cell 4.2*(w) Z' 5. CHEMICAL categories 5.1(w) Origins and properties of the sample. 5.2(w) Inclusion of peptide sequences (CCDC is preparing a report) 5.3 Crystal properties. 6. CITATION categories 6.1*(w) REFCODES for citations 8. DATABASE categories 8.1*(w) Deposit numbers for CCDC (already approved) 8.2*(w) Database history 9. DIFFRN categories 9.1(w) Twins (a report is being prepared) 9.2*(w) Replacing theta_max by resolution 9.3*(w) _diffrn_source_takeoff_angle 9.4(w) _diffrn_orient_matrix 9.5 Flag for systematic absences in diffrn_refln category 9.6 _diffrn_source_target 9.7 Rethinking _diffrn_standards_* 10. EXPTL categories 10.1(w) Sample history 10.2 Provision for describing the shape of more than one crystal 11. GEOM categories 11.1(w) _geom_bond_multiplicity 13. PUBL categories 13.1*(w) Links to the World Directory 14. REFINE categories 14.1(w) _refine_ls_F_calc 14.2(w) _refine_ls_restrained_wR-factor_all 14.3(w) Twins (a report is being prepared) 14.4 The default value of _refine_ls_extinction_method 14.5(w) _refine_ls_extinction_coef 17. SYMMETRY categories. 17.1* Replacement with the SPACE GROUP categories ------------------------------------------------------------ DETAILS OF THE ABOVE ITEMS -------------------------- 1(w). A general problem. Surveys of the Acta Cryst. and the CCDC archives show that it is often not possible to supply the exact numerical value required by CIF for, e.g. temperatures and other parameters. Sometimes there is only an upper or lower limit, a range of values, or even just an approximate value. The way to handle this problem is probably to define new data items, e.g., _*_lt and _*_gt (for less than and greater than) for the items in question. This would cover 'less than', 'greater than' and 'ranges', but does not address the problem of how one gives approximate numbers when a standard uncertainty is not supplied. It also does not address the problem of how one includes qualitative information such as 'high' and 'low' which are found in the CSD. Perhaps the text field _*_details could be used for these cases since it is unlikely that any precise quantitative use could be made of such approximate information. The items for which _lt and _gt (or one of them) could be defined occur in a number of different categories and are summarized here. Some of the suggested items do not yet exist in the dictionary. Item Current name -------- --------------------- melting temperature _chemical_melting_point decomposition temperature sublimation temperature temperature of experiment _exptl_crystal_density_measure_temp _diffrn_ambient_temperature pressure of experiment _diffrn_ambient_pressure phase transition temperature phase transition pressure measured density _expt_crystal_density_meas shift/su _refine_ls_shift/esd_max _refine_ls_shift/esd_mean decay of diffraction standards _diffrn_standards_decay 2. ATOM_SITE and ATOM_TYPE categories Several additions have been proposed here, and since these two categories are closely related they are listed together. The solutions to the problems raised by several of these items may be related. 2.1(w). A proposal for a method of defining rigid groups was first presented in the modulated structure dictionary (msCIF), but was thought to be of general interest and so was transferred to the core DMG for our consideration. There is a fairly detailed discussion on the web but no consensus has yet been reached on the best way to handle this. 2.2(w). Disorder At present disorder is handled somewhat inelegantly in the core which makes any automatic manipulation awkward. The problem arises because the present version of CIF does not make the distinction between a site in the crystal and the atoms that occupy that site. Ideally these should be given in two different lists, though this would be cumbersome for the majority of structures in which there is no disorder. 2.2a Occupational disorder. At present this is handled by giving all the elements that occupy a given site identical coordinates and occupation numbers that sum to 1.0 or less. The more elegant solution is to define the properties of the site in the atom_site loop and the properties of the atoms that occupy the site in the atom_type loop. An example of such a loop would be loop_ _atom_type_id _atom_type_symbol _atom_type_element _atom_type_occupancy 1 T1 Al 0.34(4) 2 T1 Si 0.66(4) 3 T2 Al 0.21(3) 4 T2 Si 0.70(3) _atom_type_id is the category key (unique identifier for each line) It is not currently defined for this category but category keys should as a matter of principle be added in preparation for the advanced applications that are being developed. _atom_type_symbol is the child of _atom_site_label and links to the atom_site category. This item is already in the dictionary. A more logical name that follows current naming conventions would be _atom_type_site_label _atom_type_element is not currently defined but probably should be a recognized element symbol. 2.2b Displacive disorder Displacive disorder is currently handled by two items: _atom_site_disorder_group _atom_site_disorder_assembly The intent of these two items (whose definition in the current dictionary could probably be improved) is to link together groups of atoms whose disorder is correlated, e.g., a phenyl group that may occur in two different disordered orientations. The atoms belonging to one orientation are assigned, say, to _*_group 1 and those belonging to the other orientation are assigned to _*_group 2. All sites in the same group are simultaneously occupied and have the same occupation number. Adjacent sites in different groups cannot be simultaneously occupied. _*_assembly is used if, for example, two different phenyl groups are disordered. Each would be assigned a different value of _*_assembly, so that the restrictions on simultaneous occupation only apply to groups in the same assembly. We could use a good example. What happens if these groups are rigid groups as presumably frequently happens? Can we combine a rigid group description and disordered group description into a single category? See also item 2.1. 2.3 Brian Toby has raised a problem with the ATOM_TYPE category suggesting that ATOM_TYPE_SCAT should be defined as a separate category. These items in this new category would give details of the scattering factors which depend on the radiation and wavelength used. In multi-radiation or multi-wavelength experiments the scattering factors may need to be looped and keyed to structure factors measured with different radiations. It then makes no sense to include these in the same loop that describe structure related properties such as _*_number_in_cell and _*_oxidation_state. (See 2.4). 2.4 _atom_site_scat_versus_stol_list was introduced to allow a list of scattering factors as a function of sine(theta) to be included as a loop. However, since nested loops are not allowed, this loop was given in the form of a text field that needed to be parsed in order to be accessible to the computer. This is inelegant, but could be rectified by introducing a new category keyed back to the ATOM_TYPE category, e.g. loop_ _atom_scat_factor_type_symbol #matches _atom_type_symbol _atom_scat_factor_stol _atom_scat_factor_scat_factor S 0 16.0 S 0.01 15.3 S 0.03 14.8 # data values omitted for brevity V 0 23.0 V 0.015 22.1 V 0.03 21.5 # list truncated for brevity By including _atom_scat_factor_radiation_code, it might also be possible to meet the requirements of Brian Toby described in 2.3. 2.5 Is there a need for _atom_sites_special_details for including a general discussion of say, disorder, or the effects of twinning on the coordinates given? 2.6 The default value of _atom_site_fract_* is 0.0 which does not make much sense. If the coordinates are not given they are, presumably, unknown. As the definitions now stand, coordinates must be given the explicit value of '?' or '.' if they are not known, otherwise all atoms are assumed to lie at the origin! Should we remove the default? 2.7 Do we need to make provision for including anharmonic atomic displacement parameters? 2.8 _atom_site_refinement_flags has seven enumerated values, all single letters, but unlike all other enumerated flags, these values can be concatenated. A normal dictionary-driven check of this field would fail if it encounters more than one letter in the string. We need either to include all reasonable combinations of the 7 characters in the enumeration list or treat this item as free text. Alternatively, we could replace this item by three new items, one referring to the refinement of positions (four allowed letters), another to occupation (one allowed letter) and the third to atomic displacement parameters (two allowed letters). The enumeration lists could then include all reasonable combinations of flags for each of the three items. 3. AUDIT categories No changes proposed. 4. CELL category 4.1*(w) CCDC request that we define the reciprocal cell. We could add 'reciprocal' after 'cell' in all the relevant names e.g., _cell_reciprocal_angle_alpha or use a name such as _cell_angle_alpha* in line with current usage (except that * by convention is used as a wild character). In any case the definitions should be straightforward. I assume we would also need to include the reciprocal cell volume. What would be the best form of the dataname? 4.2*(w) CCDC has also requested an item for Z', the number of formula units in the asymmetric unit. This item is used to identify the number of molecules in the structure that are not related by symmetry. _cell_formula_units_Z' could be defined as (_cell_formula_units_Z)/(multiplicity of the general position). Would this definition fail under some circumstances such as when two independent centrosymmetric molecules are found in P-1? Would the prime after Z be too inconspicuous a character when the CIF is printed? Would it be best to name this _cell_formula_units_Z_prime? 5. CHEMICAL categories This category is used to describe the chemical properties of the sample, but some of these properties are included in the EXPTL category as they are regarded as part of the structure determination or specific to the sample being studied. 5.1(w) The CCDC has identified a need for including more information about the origins and properties of the sample. They propose additional names: _chemical_compound_source_recrystallization (or _*_recrystallisation - do we have conventions on spelling?). This might be better included under _exptl since it refers to the preparation of the sample studied, see 10.1 below. _chemical_properties_physical _chemical_properties_biological These would be text fields that would allow for descriptive comments. 5.2(w) CCDC have also identified a need to include peptide sequences for some of their polypeptides. They are currently working on a scheme compatible with mmCIF for including this information. Discussion on this topic should be deferred until we receive their report. 5.3 What we do not seem to have is a category that describes specifically the properties of the crystal such as the phase (though there is a _chemical_name_structure_type which could include terms like perovskite, NaCl, etc.) Some properties that are related to the crystal rather than the compound, such as refractive index and optical activity, perhaps need their own category. The density is strictly a crystal property (different phases can have different densities) but is given in the exptl category, i.e. it is not treated as a chemical or crystal property but as part of the structure determination, even though few people routinely measure it (see 10). 6. CITATION categories This category is used for including references in the CIF to other work, principally references to journal articles. 6.1*(w) A proposal has been made for an item _citation_database_id_CSD which would contain the REFCODE of a CSD entry that was being cited (not the REFCODE of the crystal described in the CIF which is given in _database_id_CIF). The precedent is _citation_database_id_medline. Should we add additional database_ids at the same time, e.g. to PDF, PDB, ICSD etc.? 7. COMPUTING categories No changes proposed. 8. DATABASE categories This category is used for giving codes for database entries and the codens used by the databases for journal names, but it could easily be extended to include other database-related items. 8.1*(w) Recently the CCDC has received approval for a couple of additional codes to give deposit numbers to entries in some of their archival files that are not part of the main database, e.g. _database_code_depnum_ccdc_journal for entries passed to CCDC by journals before they are processed into the main database. Since this change has been approved, no further discussion is needed. 8.2*(w) This category would also seem to be the best place for including information on changes that have been made in an entry by a database prior to the production of an output CIF. One possibility is to introduce items such as _database_CSD_audit or database_CSD_history to record these changes. 9. DIFFRN categories Several changes are proposed for this category. 9.1(w) There is a proposal to introduce items for describing twins, partly in this category and partly in REFINE (see 14.3). This suggestion is currently being pursued by a special subcommittee. We should await their report. 9.2*(w) The suggestion is made that the items: _diffrn_measured_fraction_theta_full _diffrn_measured_fraction_theta_max _diffrn_reflns_theta_full _diffrn_reflns_theta_max should be replaced by: _diffrn_reflns_measured_fraction_resolution_full _diffrn_reflns_measured_fraction_resolution_max _diffrn_reflns_resolution_full _diffrn_reflns_resoltuion_max This would place all four items in the same category (which is logical) and would replace theta (which depends on the wavelength) with the resolution in Angstroms (which does not). This is in line with other definitions in CIF. 9.3*(w) A suggestion from the powder diffraction community that the following item might be useful: _diffrn_source_takeoff_angle 9.4(w) The item: _diffrn_orient_matrix depends on the particular diffraction geometry used and is defined by the user in _diffrn_orient_matrix_type. This runs counter to our philosophy that no item should depend on a second item to determine its meaning (i.e. all items are insensitive to their context). In some of the recent dictionaries considerable care has been taken to define an axis system that does not depend on a particular diffractometer or its orientation. For example, coordinate axes can be chosen along the incident beam and the normal to the plane of diffraction, but even this basic definition runs into problems in the case of area detectors. Can we improve the definition used in the core dictionary? 9.5 Is a flag needed for systematic absences in the DIFFRN_REFLN category? Since _diffrn_reflns_number excludes the systematic absences, a computer can only check that the sum is correct if it can identify the systematically absent reflections. There is such a flag in the REFLN category but the original definitions were based on the notion that systematic absences are a property of the structural model being refined (hence of the calculated structure factors) and not a property of the diffraction measurements (observed structure factors).. 9.6 Details of the radiation used are given in the category DIFFRN_RADIATION, except for _diffrn_source_target. Is this arrangement appropriate? The problem arises if a crystal is studied using different types of radiation. 9.7 With the growing use of area detectors do we need to rethink the _diffrn_standards_* items by adding items suitable for area detector measurements? 10. EXPTL categories This is where various properties of the studied sample are presented. There is some overlap with the CHEMICAL categories (see 5 above). 10.1(w) Do we need to give details of the sample history, specially for inorganic materials where heat treatment, annealing in different atmospheres, etc is important? This might also be the best place to describe the growth of the single crystal specimen used in the study (see 5.1). 10.2 Although the CIF dictionaries envision the possibility of reporting measurements on more than one crystal, there is only provision for describing the faces of one of them. Should we define an item _exptl_crystal_face_crystal_id which would be a child of _exptl_crystal_id, allowing the faces of more than one crystal to be reported? 11. GEOM categories 11.1 The current geom definitions allow all the bonds around a given atom to be listed with their symmetry operations, but Acta Cryst. only prints those in the asymmetric set, i.e., bonds related to others by symmetry are not printed. Thus for NaCl only one of the six Na-Cl bonds is printed, but it is useful to know in these cases how many bonds of this type there are. The proposal is for an item: _geom_bond_multiplicity This could be given as an alternative to listing all the bonds, or it could be defined only for the bond that is flagged for printing with the choice left up to the user. This problem occurs frequently in inorganic compounds. Would a similar item be useful for angles? I don't know what kind of fix Acta Cryst. currently uses when setting up for print. 12. JOURNAL categories No changes proposed 13. PUBL categories 13.1*(w) Acta Cryst. is proposing two items which allow them to connect the authors of papers with their entry in the World Directory via an author id number: _publ_contact_author_id_iucr _publ_author_id_iucr 14. REFINE categories 14.1(w) The following items have also been transferred from the msCIF dictionary as being of general more interest: _refine_ls_F_calc_accuracy _refine_ls_F_calc_details _refine_ls_F_calc_formula The last item gives the analytical expression used to calculate the structure factors (presumably when this is not standard) with further details given in _*_details leading to an estimate of the accuracy of the calculated Fs. Presumably these items would only be used in special circumstances. Full dictionary descriptions are given on the web. 14.2(w) Also from the msCIF dictionary with a full dictionary descriptions given on the web are: _refine_ls_restrained_wR-factor_all _refine_ls_restraints_weighting scheme The intent is to define an R factor for restrained refinements. There is some discussion of the proposed expression on the web. 14.3(w) There is a proposal that CIF should be able to describe twins with new items in DIFFRN (see 9.1) and REFINE. This item is currently being reviewed by a special committee. We should wait for their report. 14.4 The default value of _refine_ls_extinction_method is 'Zachariasen'. This default not only makes presumptions about the standard type of extinction correction (which is inappropriate), but it also assumes that an extinction correction was made if no extinction is reported! Strictly, the absence of an extinction corrections can only be signalled by setting this value explicitly to '.' Omitting this item from a CIF implies that a Zachariasen correction was made. Should we remove this default? 14.5(w) At present _refine_ls_extinction_coef is a catch-all for the parameter refined for any type of extinction correction. Its value only has meaning in the context of the value of _refine_ls_extinction_method. Since CIFs are supposed to be context independent (i.e. the meaning of an item does not depend on the value given to any other item), we should define different names for each of the coefficients determined by different methods. 15. REFLN categories No changes proposed 16. REFLNS categories No changes proposed 17. SYMMETRY categories 17.1* The symmetry CIF dictionary approved by COMCIFS last December provides a much more carefully thought out set of symmetry definitions than is currently available in the current symmetry categories. This dictionary can be found on the IUCr web site. It contains three categories: SPACE_GROUP SPACE_GROUP_SYMOP SPACE_GROUP_WYCKOFF The recommendation is that the current symmetry categories which have some conceptual weaknesses should be replaced by the new SPACE_GROUP categories. The question is: do we wish to include the whole of the symmetry CIF dictionary in the core or only a subset of items, and if so, which?
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Prev by Date: Re: CIF core dictionary
- Next by Date: Item 5.2
- Prev by thread: Item 5.2
- Next by thread: CIF core dictionary
- Index(es):