[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
Background to our discussion
- To: coreCIFchem <corecifchem@iucr.org>
- Subject: Background to our discussion
- From: "I. David Brown" <idbrown@mcmail.cis.mcmaster.ca>
- Date: Wed, 15 Oct 2003 15:55:43 -0400 (EDT)
Dear Colleagues, Thank you for agreeing to be part of the coreCIFchem project. This email gives an outline of the work we have to do, some information on the way CIF currently handles chemical information and suggestions on what we need to do to get started. Please send any comments to the 'reply to' address before November 15. Contents -------- 1. Purpose of the discussion 2. Background 3. Some important consideration 4. Chemical properties that need to be defined 5. The current situation in CIF 6. Getting started 1. Purpose The task of this group is to propose a series of coreCIF data items that describe chemical (as opposed to crystallographic) properties. These will be submitted to the coreCIF Dictionary Maintenance Group (DMG) for inclusion in the next version of the cif_core dictionary. In particular we need to propose appropriate categories for these items and identify the relationships between them. We should also consider whether such a chemical description can simplify the way in which rigid groups and disorder are described in coreCIF. 2. Background Most of the items defined in coreCIF describe the properties of the crystal. The atoms listed in the _atom_site loops are assumed to define some sort of chemical unit, but strictly speaking the coordinates listed in the CIF only define points in the unit cell. The properties of these points are intrinsically crystallographic (site symmetry, multiplicity, etc.), but we have chosen to associate each of these points with an atom even when no atom occupies the site! Atomless sites are needed, for example, to identify the center of mass of a molecule or to define a local coordinate system. We currently get around this problem by the artificial and curious expedient of occupying the site with a dummy atom. CoreCIF currently makes no provision for associating atoms with particular molecules, complexes or other chemical entities, because there is currently no way to specify these (the moiety formula lists the moieties present and the entities are implicit in the connectivity tables, but these are not linked). Recent requests for the inclusion of molecular information, such as Z', in CIF mean that it is time to consider how to identify molecules and other chemical entities, and how to associate them with the positions they occupy in the unit cell of the crystal. 3. Some important considerations. 3.1 Chemical Our discussion will focus on the identification of one or more molecules and their properties as listed in 4. below. While such an identification is important for small molecule chemists, the needs of inorganic chemists must be addressed as well. Atoms, bonds and molecules are all chemical rather than crystallographic entities. 3.2 File structures It is essential that CIF be kept on a convergent, rather than a divergent, track. We therefore need to know how mmCIF includes chemical information (John Westbrook is a member of this discussion list can help us here). We also need to work towards the time when all CIF dictionaries will use the same advanced Dictionary Definition Language (DDL). This will include methods, which are machine-readable algorithms that a program can use to calculate the value of an item missing from the CIF. By carefully defining the items we can simplify the eventual conversion from DDL1 to an advanced DDL (Hall and Westbrook will keep us on track here). We must also keep in touch with the chemical community represented by an IUPAC group which is currently defining items for universal XML chemistry schema (the XML equivalent of a CIF dictionary). There is much to be gained from maintaining transparency between the different electronic forms in which chemical information is transmitted and archived. This requires that CIF dictionaries and XML schema adopt compatible definitions. (I will be attending an IUPAC workshop on this topic in November and will liaise with this group. I hope to find a member of the IUPAC group willing to help us in our discussions.) 4. Chemical entities and properties we may wish to define: 4.1 Properties related solely to chemistry: 4.1.1 Atoms: elemental constitution, atomic mass, oxidation state (formal charge), electronegativity, elemental coordinates in the periodic table (i.e., period and group), positional coordinates in a molecular (rather than crystallographic) coordinate system, atomic radii of various kinds. 4.1.2 Bonds: Terminal atoms, bond order (bond valence, bond number etc.), length, maximum bonding length (calculated, e.g., from the atomic radii), bond angles, torsion angles. 4.1.3 Molecules: Composition, chirality, optical rotation, formal charge, pKa, symmetry, dipole moment. 4.2. Properties related to both the chemistry and the crystallography: 4.2.1 Atoms: scattering factors, disorder, occupation number. 4.2.2 Bonds: Angles defined by crystallographic symmetry. 4.2.3 Rigid Groups: Scattering factors, crystallographic symmetry, disorder. 4.2.4 Molecules: Crystallographic symmetry, crystallographic multiplicity, Z'. 4.3 Should we keep the intrinsic chemical properties logically separated from properties, such as colour, atom size, required for a molecular display? Or should the display parameters all be set in the application? 4.4 Should we include items that might be used in modelling, or should these all be set in the application? We need to keep in mind that refinement against the diffraction pattern is a form of modelling which is increasingly being combined with refinement against various chemical criteria, so we may wish to consider refinement in the context of modelling. In all the above cases the atoms used in defining the chemical properties must be linked to the crystallographic positions they occupy. 5. Current CIF practice. 5.1 Interatomic distances are currently reported in the geom_* categories using _atom_site_labels defined in atom_site. These report distances found in the crystal, usually occurring between atoms and therefore (inappropriately) divided into bonds and contacts. Since the names 'bond' and 'contact' are chemical concepts, a given distance must arbitrarily be assigned to one or the other category even when one of the atoms is a dummy, or when the atoms are well separated and form neither bonds nor contacts. While this may not present a serious problem in current CIF usage, it might cause problems as we sharpen the definitions and rely more heavily on computer manipulation of CIF. 5.2 Chemical information about atoms is given in the atom_type category but the information in this category is diverse and not logically organized, e.g.: 5.2.1 File management: _*_symbol is the link with the atom_site category 5.2.2 Chemical: _*_description _*_oxidation_number has an open definition which can be interpreted in many different ways _*_radius is defined as the 'effective intra- and intermolecular bonding radius'. We should be able to come up with a better definition and maybe define other kinds of radii suitable for other purposes. 5.2.3 Chemical and crystallographic _*_analytical_meas_% is closely related to chemical_formula_analytical. It, and the next item, refer to the whole cell, while other items in this category refer to individual atoms. _*_number_in_cell _*_scat_* consists of a variety of items that give information on the scattering factors assumed during the refinement. Do these belong more logically in the refine categories since they are part of the refined model rather than intrinsic properties of an atom? 5.3 The chemical_conn_atom and chemical_conn_bond categories can be used to describe the chemical connectivity. There is no provision for defining more than one moiety except by defining disconnected units. The atoms in these categories are children of the parent _atom_site_chemical_conn_number in the atom_site category. Is this the logically correct hierarchy? 5.4 _chemical_optical_rotation is an elaborate parsable string that refers explicitly to 'the optical rotation in solution of the compound', the implication being that the crystal contains only a single molecular species. It is the only item in the chemical category that is not related to the crystal. 6. Getting started John Westbrook may be able to give us a quick introduction to the way this kind of information is handled in mmCIF. The nature of the molecules are different, but we might benefit from mmCIF experience, and we need to maintain coherence with mmCIF. I will find out about the IUPAC project and report back. Once we are supplied with these items of information we should be in a better state to move forward. Best wishes David ***************************************************** Dr.I.David Brown, Professor Emeritus Brockhouse Institute for Materials Research, McMaster University, Hamilton, Ontario, Canada Tel: 1-(905)-525-9140 ext 24710 Fax: 1-(905)-521-2773 idbrown@mcmaster.ca ***************************************************** _______________________________________________ coreCIFchem mailing list coreCIFchem@iucr.org http://scripts.iucr.org/mailman/listinfo/corecifchem
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Follow-Ups:
- Re: Background to our discussion (Howard Flack)
- Prev by Date: Re: Message from the corecifchem chair
- Next by Date: Re: Background to our discussion
- Prev by thread: IUPAC workshop on XML and IChI
- Next by thread: Re: Background to our discussion
- Index(es):