[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
coreCIFchem #5
- To: Chemical information in core CIF <corecifchem@iucr.org>
- Subject: coreCIFchem #5
- From: David Brown <idbrown@mcmaster.ca>
- Date: Fri, 25 Jun 2004 10:13:24 -0400
Dear Colleagues, Even with an extended deadline I did not get many responses from the last mailing to coreCIFchem (Discussion Paper #4) though the ones I did receive had perceptive comments that have been incorporated into the present revised proposal. We are making considerable progress. The framework for reporting the chemical structure is becoming clear and we now need to turn attention to some of the details. Reading through the previous proposal (#4) I can understand why the response was low. This present email contains a simplified and more flexible proposal which I hope you will find easier to understand, even though it occupies 21 printed pages. Much of this is, however, explanatory notes accompanying the two sample CIFs. For the sake of keeping the project moving I am setting AUGUST 31, 2004 as the deadline for responses, but this deadline can be extended if you need more time. PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org David COMMENTS RECEIVED ON DISCUSSION PAPER #4 I received substantive comments from two people: GREG SHIELDS wrote: ------------------- Sorry to bring up the issue of disorder again, but I have been thinking about the latest proposals, in particular in how they relate to more complex examples of disordered molecules. Whilst I think I can understand how they would be applied to relatively simple cases of disorder, I am not sure how this would be extended to larger molecules with a number of independently disordered groups. In such cases, there could be a large number of possible combinations of the submappings for each of the disordered groups in one molecule, and hence a large total number of mappings for the whole molecule. I am concerned that the description could become unwieldy in such cases, and perhaps we may need to consider only storing the mapping of the atoms common to all configurations once, along with one configuration of each of the disordered groups, using sub-mappings to describe the other configurations of the disordered groups. I expect you have already considered these problems, and I think it would be useful to know how you were considering dealing with more complex cases in these proposals. I have attached a portion of a CIF describing a molecule with twofold disorder in two propylene bridges (with occupancies 0.62/0.38 and 0.55/0.45 respectively) as an example slightly more complex cases (this is taken from a CIF deposited at the CCDC - Journal : Dalton Trans. (0222), P: 2872, Y: 2003; Authors: L.Salmon,P.Thuery,E.Riviere,J.-J.Girerd,M.Ephritikhine). I would be interested to hear how this would be described in the proposed framework. IDB Reply --------- You are right to raise this problem and it was one that had occurred to me. I think you will find the latest proposal more satisfactory since it does not require the author to specify how the disordered atoms sites are combined in the individual molecules, something that cannot be determined from the diffraction experiment. If desired, these combinations can be specified using the existing items _atom_site_disorder_assembly and _atom_site_disorder_group or using the proposed new molecular_geom loops. HOWARD FLACK wrote ------------------ I see things a little differently from David concerning >>GREG: >> - is it possible that a molecule may have a lower symmetry >> order than the site symmetry which it occupies in such cases ? For me the answer is definitely yes. La coupe du Roi (a way of cutting an apple into two homochiral halves) is just such a case. A structure containing molecules which are enantiopure but disordered may well shown up an average disordered 'molecule' which is achiral. IDB response ------------ While symmetry is interesting subject and is important in any analysis of a crystal structure, the CIF proposals as they are developing do not currently include an hierarchical specification of symmetry. I suggest that we do not distract ourselves by spending time on the nature of the relationship between crystal and molecular symmetry. HOWARD FLACK continued ---------------------- Somehow or other I was worried that the proposed scheme did not allow for alternate descriptions of the topology of the same structure and that for the disorderd TNT case we ended up with almost duplicating information in the FORMULA_UNIT_ATOM and FORMULA_UNIT_BOND loops for the two molecules and then being forced to map them on to each other in the FU2FU loop. My suggestion is to remove all topology information from the FORMULA group and incorporate it into the MOLECULAR group as the highest (first) level of the topology description. The FORMULA group would be reduced to just the FORMULA_UNIT_ATOM loop and would be an atomic description only of the formula unit. The MOLECULAR_UNIT group remains essentially the same but contains the topology information of the highest level which in David's text is incorporated in the FORMULA unit. In the MAPPING section FU2FU is not needed but FU2MU needs an index to identify each mapping (i.e. each set of atomic mappings) so that a loop containing the molecular information which David put in the FORMULA_UNIT loop can be implemented. It seems to me that these changes allows all of the information that David correctly wishes to capture but in a simplified form. In the disordered TNT example there would be just one MOLECULAR_UNIT of index 1 being one complete TNT molecule. There would be two mappings (each with its own index), mapping this one TNT molecule onto two different set of atoms in the FORMULA unit although these two sets need not be disjoint. These FU2MU mappings each need an index because although the topologies of the two molecules are identical they may have different conformations or chirality. I suspect that these changes also make it easier to work this system in conjunction with a database of molecules and molecular fragments which is the only way that it would be used in practice. It also allows distinct descriptions of the topology of one structure to be captured. In a private email Howard added: In #4 you write "The second question was whether the geometry should be combined with the description of the graph of the molecular units and this too has been adopted." I managed to convince myself that combining the topology and the geometry information was NOT the best way to go. It seems to me that combining the two sets of information is perfectly adequate for the 'one-structure-at-a-time' boys. However when one is engaged in a study of several crystal structures of several related compounds (molecules), I can see great merit in keeping the topoloy and the geometry information separate. This is because to one topological map there may correspond several distinct types of geometry. (e.g. side chain rotation, rotation about a crucial C - C bond so one geometry is eclipsed, another skew, gauche or staggered etc). IDB Response ------------------ I have tried to incorporate these ideas into the new proposal. FORMULA_UNIT has now disappeared entirely. The topologies are given in the molecular_unit categories and the geometries in the molecular_geom categories. Either of these can be mapped onto the crystal. WHAT IS NEW IN VERSION #5? Others besides Greg have also pointed out that version #4 was unwieldy and would likely be rarely used if we approved it in that form. Version #5 given below is much simpler and more flexible. The formula_unit categories that were included in version 4 have been deleted, making the overall structure simpler and easier to follow. Of course the crystallographic description of the structure will continue to be given in the _atom_site and _geom categories. The chemical topology is given in the molecular_unit categories in the form of a graph with atoms connected by bonds. Theses categories are very flexible: It is possible to define one or more molecules, submolecular units such as functional groups or complex ions, and the graphs may include part or all of the contents of the crystal. It is also possible to describe infinitely connected structures by specifying the graph of a finite formula unit. A separate set of categories, _molecular_geom, is used to give the geometries of the molecular units as recommended by Howard. Mapping the molecular units onto the crystallographic description however presents some complications. The obvious way is to include pointers to the corresponding atom in the molecular_unit (or moleclar_geom) categories in the atom_site loop (or vice versa) but this is not always possible. For example the asymmetric unit in the crystallographic description may contain only a part of the molecule, disorder may require that the chemical graph be mapped onto more than one set of atoms in the crystal, and an infinite graph contains bonds that link the (finite) molecular units together as well as bonds that link atoms within the molecular unit. This means that it is not possible in general to include the molecular mapping directly in the atom_site category nor is it possible to point to the atoms of the crystal from the molecular_unit categories. Therefore a separate set of mapping categories, mol2xtl_map, is needed to provide the required flexibility. -------------- PREAMBLE TO THE REPORT TO THE CORE DICTIONARY MAINTENANCE GROUP The crystallographic information given in a CIF consists of the atomic coordinates of the asymmetric unit, the symmetry operations needed to generate all the atoms in the crystal, the lattice parameters and the interatomic distances. The present proposal provides a means of giving a chemical description of the structure in the form of the bonding topology (given in the proposed molecular_unit categories) and the ideal geometry (given in the proposed molecular_geom categories). The molecular and crystallographic descriptions are mapped on to each other using items in the mol2xtl_map categories. A chemical description of the contents of the crystal, distinct from the crystallographic description, serves a variety of uses. It describes the contents of the crystal in the language of chemistry rather than the language of crystallography thus allowing the structure determinations of crystals containing particular molecules or fragments to be searched out and the atomic coordinates of the relevant atoms to be retrieved. Further the crystal structure determination does not itself identify which atoms are bonded. A topological description of the bonding network supplements the information given in the _geom_bond loop which, in spite of its name, gives interatomic distances without necessarily requiring that they correspond to chemical bonds. The topological description is useful to identify different molecules in a crystal, or crystallographically distinct copies of the same molecule. Finally the proposed chemical description allows the ideal geometry and conformation of the molecular units to be specified - information which can be used during the refinement of the crystal structure of for validating the experimental bond distances and angles. The set of molecular_unit categories gives a description of one or more ideal chemical structures (molecular units) in the form of a topological graph, i.e., a list of atoms and the connections that the bonds form between them. MOLECULAR_UNIT Lists the different molecular units described MOLECULAR_UNIT_ATOM Lists the atoms in each molecular unit MOLECULAR_UNIT_BOND Lists the bonds between the atoms This allows a flexible description for defining one or more molecular units (e.g., molecules, formula units, charged or neutral complexes). Chemical properties can be assigned to each atom, and bonds can be assigned as links between two atoms. However, the topological description given here does not include the geometry which is given in the molecular_geom categories. None of the information given in the molecular_unit categories is derived directly from the crystal structure: it is supplied by the author by way of a chemical interpretation. It is not necessary that the molecular units account for all the atoms found in the crystal structure, nor that the crystal structure contain all the atoms specified in the molecular units. The contents of the crystal may be described in terms of more than one molecular unit, and a hierarchy of molecular units may be defined, with, for example, one molecular unit describing a functional group such as a carboxyl group while another molecular unit specifies a complex, such as an acetate ion, that contains the functional group. The decision as to what constitutes a MOLECULAR UNIT is left entirely to the author but in the case of the infinitely bonded solids typically found in inorganic compounds the molecular unit would normally be the formula unit. This is the smallest group of atoms that contains all the chemical elements in the same proportions as they are found in the crystal. The chemical formula of this unit will normally contain only integer multipliers, but in cases where this is not possible, e.g., in non-stoichiometric crystals such as minerals, the size of the formula unit is necessarily arbitrary. It must be at least as large as the asymmetric unit of the crystal and normally no larger than the primitive unit cell. The molecular_unit categories give only the topology of the molecules. The conformation and geometry are given in the molecular_geom categories because it is possible for a given topology to correspond to more than one conformer, e.g., cis and trans isomers: MOLECULAR_GEOM MOLECULAR_GEOM_ATOM MOLECULAR_GEOM_BOND MOLECULAR_GEOM_ANGLE MOLECULAR_GEOM_TORSION The geometry may be given by specifying atomic coordinates in a rectangular Cartesian coordinate system of arbitrary orientation, or by giving bond lengths, angles and torsion angles. The atoms and bonds of the molecular units are mapped directly onto the descriptions of the geometry. The MOL2MOL_MAP_ATOM category allows the atoms of different molecular_units to be mapped onto each other. This feature will likely not be used often. Mapping the graph of a molecular unit onto the crystal structure is less straightforward because of problems that result from disorder, crystallographic symmetry and infinitely bond graphs. For this reason a special set of MOL2XTL_MAP categories is defined to allow some or all of the atoms in the molecular_unit or molecular_geom categories to be mapped onto the atom_site categories. MOL2XTL_MAP MOL2XTL_MAP_ATOM Maps the atoms of the molecular unit to the crystal MOL2XTL_MAP_BOND Maps the bonds of the molecular unit to the crystal Details of the definitions of the molecular units and their mappings are illustrated by two sample CIFs, one of an organic molecule, the other an infinitely connected inorganic solid. SAMPLE CIFS ----------- The first CIF describes the structure of the molecule trinitrotoluene, TNT. It shows how a finite molecular graph is handled when the molecule lies on a Wyckoff special position and two of the nitro groups are disordered. By way of illustration, several subunits of the molecule are also defined and are mapped onto the molecule itself. The second example describes the structure of CaCrF5 which has an infinite bond graph, and a formula unit that includes more than one asymmetric unit. [Editorial comment: The sample CIFs are intended only to show the organization of the information. Data names may be changed in the final report and dictionary definitions will eventually be needed but it is simpler to discuss the CIF structure in terms of annotated examples. Suggestions for better names are welcome. Items marked as 'list reference' are required for the management of the relational file structure and must be unique for each line in the list. The list reference item in one loop is frequently parent to similarly named items in other loops.] FIRST EXAMPLE ------------- TRINITROTOLUENE O CH3 0 | | | O --- N2 C1 N6 --- O \ / \ / C2 C6 | | H -- C3 C5 -- H \ / C4 | N4 / \ O O In the fictitious crystal structure I have invented for the purposes of this illustration, there is a crystallographic mirror plane perpendicular to the benzene ring that includes the methyl group and the N4 nitro group. The N2 and N6 nitro groups are related by the mirror plane and are disordered with the two components having occupation numbers of 0.5. Because of the disorder. the crystallographic result does not define the point group of an individual molecule. By choosing one combination of the disordered nitro groups the molecule would have Cs symmetry, but by choosing the opposite combination the individual molecules would have C1 symmetry. Both or either combination may of course be present in the real crystal but x-ray diffraction cannot distinguish between them. ############# Beginning of first CIF ############# # # data_disordered_TNT # # The first set of loops define the topology of the TNT molecule # (molecular_unit 1) and two fragments of the molecule (molecular_units 2 and # 3). The fragment definitions likely would not often be used but are # included here by was of illustration. # # If a crystal contained molecules of more than one compound, each would be # described as a separate molecular unit. If the crystal contained more than # one copy of the same molecule in the asymmetric unit (Z'>1) the topology of # the molecular unit would be given only once. # # The items in each loop belong to the same category whose name forms the # first part of the datanames of all items in the loop. # # The list reference items in each loop are unique for each line and are here # given sequential numbers which is satisfactory for computer analysis but # makes a visual inspection of the mappings more difficult. However, the list # reference items could be constructed from, e.g., the _molecular_unit_id and # the _molecular_unit_atom_label since the contents of the _*_id character # string may have any value so long as it is unique within the list. # # The part of the CIF describing the crystallographic structure is omitted, # but its description should be self-evident in the mapping loops. # ############################################################ # DEFINING THE MOLECULAR UNITS # # The first loop lists the different molecular units that are being defined # together their properties. We may wish to define other properties besides # those shown here. # loop_ _molecular_unit_id # List reference _molecular_unit_name _molecular_unit_formula _molecular_unit_point_group _moelcular_unit_Zprime _molecular_unit_details 1 'trinitrotoluene' 'C7 H5 N3 O6' m 1 'This is the whole molecule' 2 'benzene ring' 'C6 H2' mm2 1 'A portion of the TNT molecule' 3 'nitro group' 'N O2' mm2 3 'A group that appears three times in the TNT molecule' # # The atoms that form the molecular units are listed in the # MOLECULAR_UNIT_ATOM category. Atomic properties relevant to the topology # are listed here, but properties related to the geometry are listed in the # molecular_geom_atom category. # # What I have called atom_valence represents the number of valence electrons # used in bonding. ******* What is the best name for this? Formal oxidation # state? ********** # # The dictionary already contains instructions for drawing a 2-D molecular # diagram in the set of chemical_conn categories. Although the chemical_conn # categories also describe the topology of a molecule they are not a # substitute for the molecular_unit categories because they are restricted to # organic molecules, they are designed only to display a molecular diagram and # the atoms are not mapped onto the atom sites in the crystal. It would, # however, be possible to include an item # _molecular_unit_atom_conn_atom_number as a child of # _chemical_conn_atom_number in the following loop to allow the molecular unit # to be mapped to chemical_conn and hence plotted as a 2-D diagram. # loop_ _molecular_unit_atom_id # List reference _molecular_unit_atom_mu_id # Child of _molecular_unit_id _molecular_unit_atom_label _molecular_unit_atom_atom_type_symbol # Child of _atom_type_symbol _molecular_unit_atom_valence _molecular_unit_atom_coord_number _molecular_unit_atom_details 1 1 C1 C 4 3 ? 2 1 C2 C 4 3 ? 3 1 C3 C 4 3 ? 4 1 C4 C 4 3 ? 5 1 C5 C 4 3 ? 6 1 C6 C 4 3 ? 7 1 C7 C 4 4 ? 8 1 H71 H 1 1 ? 9 1 H72 H 1 1 ? 10 1 H73 H 1 1 ? 11 1 N1 N 3 3 ? 12 1 O1 O 2 1 ? 13 1 O2 O 2 1 ? 14 1 N1 N 3 3 ? 15 1 O1 O 2 1 ? 16 1 O2 O 2 1 ? 17 1 N1 N 3 3 ? 18 1 O1 O 2 1 ? 19 1 O2 O 2 1 ? # # The above items define all the atoms in the molecule. The remaining items # in this list show how parts of the molecule might be described as separate # molecular units. # 20 2 C1 C 4 3 Benzene 21 2 C2 C 4 3 Benzene 22 2 C3 C 4 3 Benzene 23 2 C4 C 4 3 Benzene 24 2 C5 C 4 3 Benzene 25 2 C6 C 4 3 Benzene 26 3 N1 N 3 3 'Nitro group' 27 3 O1 O 2 1 'Nitro group' 28 3 O2 O 2 1 'Nitro group' # # The next loop defines the bonds in each of the molecular units, again giving # just the topological properties of the bond, not the geometry. In some # cases, e.g., in polar compounds, the order of the atoms may be important # (see the next example). This problem has not been addressed in the current # proposal. ******** How would one indicate that the direction was or was not # important? ****** # # In the MOLECULAR_UNIT_BOND categories the atoms are referred to by their # _*_atom_id which in this example are sequential numbers. # However, _molecular_unit_bond_atom_id can be # composed of any characters and the user could choose to construct # _molecular_unit_atom_id out of the _molecular_unit_id and the atom # label to make the lists easier for humans to understand (the computer does # not care which system is used). In this case the first three rows of the # previous table might look like: # 1C1 1 C1 C 4 3 ? # 1C2 1 C2 C 4 3 ? # 1C3 1 C3 C 4 3 ? # and the first three rows of the following table might look like: # 1C1C2 1C1 1C2 1.5 delocalized # TNT Benzene ring # 1C2C3 1C2 1C3 1.5 delocalized # 1C3C4 1C3 1C4 1.5 delocalized # etc. # The definition of bond order is left to the user, but we may wish to # define other items corresponding to particular definitions of bond order # based on the method by which the bond order is determined. For # example bond orders derived using Kirchhoff-like network equations can be # derived directly form the topology and would therefore be appropriate to # include here. Other definitions are based on quantum mechanical # calculations which normally require a knowledge of the geometry and are # therefore less appropriate for inclusion here but still useful. # loop_ _molecular_unit_bond_id # List reference _molecular_unit_bond_atom_id_1 # Child of _molecular_unit_atom_id _molecular_unit_bond_atom_id_2 # Child of _molecular_unit_atom_id _molecular_unit_bond_order _molecular_unit_bond_type 1 1 2 1.5 delocalized # TNT Benzene ring 2 2 3 1.5 delocalized 3 3 4 1.5 delocalized 4 4 5 1.5 delocalized 5 5 6 1.5 delocalized 6 6 1 1.5 delocalized 7 1 7 1.0 single # TNT Methyl group 8 7 8 1.0 single 9 7 9 1.0 single 10 7 10 1.0 single 11 2 11 1.5 delocalized # TNT N2 nitro group 12 11 12 2.0 double 13 11 13 2.0 double 14 4 14 1.5 delocalized # TNT N4 nitro group 15 14 15 2.0 double 16 14 16 2.0 double 17 6 17 1.5 delocalized # TNT N6 nitro group 18 17 18 2.0 double 19 17 19 2.0 double # # The rest of this loop lists the bonds in the benzene ring (20-25) and nitro # group (26-27) molecular units. # 20 20 21 1.5 delocalized # Benzene ring 21 21 22 1.5 delocalized 22 22 23 1.5 delocalized 23 23 24 1.5 delocalized 24 24 25 1.5 delocalized 25 25 20 1.5 delocalized 26 26 27 2.0 double # Nitro group 27 26 28 2.0 double # ########################################################### # DEFINING THE MOLECULAR CONFORMERS AND GEOMETRY # # The disordered nitro groups can be combined in four different ways, a-a and # b-b (both with Cs symmetry), and a-b and b-a (both having C1 symmetry, one # being the enatiomer of the other). For illustrative purposes only the a-b # and a-a conformers are described here. It is, of course, not necessary to # identify which conformers are present if this is not known. The crystal can # be mapped directly to molecular_unit rather than the conformer given # in molecular_geom. # # The ideal geometries of the conformers differ only in the torsion angles, # but the complete ideal geometry for each conformer is defined here. # # The geometries of molecular units 2 and 3 are not given. # # The source of the bond lengths and angles could be given in the # _molecular_geom_details field. # loop_ _molecular_geom_id # List reference _molecular_geom_point_group # Point group of molecule _molecular_geom_mu_id # Child of _molecular_unit_id _molecular_geom_details 1 C1 1 'Expected geometry of TNT C1' 2 Cs 1 'Expected geometry of TNT Cs' # # In this example the geometry of the benzene ring is defined by atomic # coordinates, the remaining geometries are defined by their bonds and # angles. # # The basis for the orthogonal coordinates is given in Angstroms but its # orientation is arbitrary. It is up to the programmer to decide the best way # to use this information. # loop_ _molecular_geom_atom_id # List reference _molecular_geom_atom_geom_id # Child of _molecular_geom_id _molecular_geom_atom_mu_atom_id # Child of _molecular_unit_atom_id _molecular_geom_atom_mu_atom_coord_x # Coordinates of atom in Angstrom _molecular_geom_atom_mu_atom_coord_y _molecular_geom_atom_mu_atom_coord_z _molecular_geom_atom_mu_atom_details 1 1 1 0.037 0.146 -0.124 Benzene 2 1 2 1.378 0.562 0.134 Benzene 3 1 3 1.846 1.421 0.204 Benzene 4 1 4 2.567 1.834 0.304 Benzene 5 1 5 1.745 1.563 0.245 Benzene 6 1 6 0.962 0 498 0.103 Benzene 7 1 7 ? ? ? methyl 8 1 8 ? ? ? methyl 9 1 9 ? ? ? methyl 10 1 10 ? ? ? methyl 11 1 11 ? ? ? N2_nitro 12 1 12 ? ? ? N2_nitro 13 1 13 ? ? ? N2_nitro 14 1 14 ? ? ? N4_nitro 15 1 15 ? ? ? N4_nitro 16 1 16 ? ? ? N4_nitro 17 1 17 ? ? ? N6_nitro 18 1 18 ? ? ? N6_nitro 19 1 19 ? ? ? N6_nitro # # The geometry of the second conformer follows # 20 2 1 0.037 0.146 -0.124 Benzene 21 2 2 1.378 0.562 0.134 Benzene 22 2 3 1.846 1.421 0.204 Benzene 23 2 4 2.567 1.834 0.304 Benzene 24 2 5 1.745 1.563 0.245 Benzene 25 2 6 0.962 0 498 0.103 Benzene 26 2 7 ? ? ? methyl 27 2 8 ? ? ? methyl 28 2 9 ? ? ? methyl 29 2 10 ? ? ? methyl 30 2 11 ? ? ? N2_nitro 31 2 12 ? ? ? N2_nitro 32 2 13 ? ? ? N2_nitro 33 2 14 ? ? ? N4_nitro 34 2 15 ? ? ? N4_nitro 35 2 16 ? ? ? N4_nitro 36 2 17 ? ? ? N6_nitro 37 2 18 ? ? ? N6_nitro 38 2 19 ? ? ? N6_nitro # # Ideal bond lengths are given for each of the conformers defined above. # Since the benzene rings are defined by their coordinates, their bond lengths # are not given here. The distances here are not those derived from the # crystal structure determination but are those expected by the author. As # above, the definition of the bond orders would be left to the author's # discretion or could be omitted if given in the description of the topolgy # (They are, of course, optional in both places). # _molecular_geom_bond_id # List reference _molecular_geom_bond_atom1_id # Child of _molecule_geom_atom_id _molecular_geom_bond_atom2_id # Child of _molecule_geom_atom_id _molecular_geom_bond_distance # Bond distance in Angstroms _molecular_geom_bond_order _molecular_geom_bond_details 1 1 2 ? 1.5 delocalized # TNT1 Benzene ring 2 2 3 ? 1.5 delocalized 3 3 4 ? 1.5 delocalized 4 4 5 ? 1.5 delocalized 5 5 6 ? 1.5 delocalized 6 6 1 ? 1.5 delocalized 7 1 7 1.54 1.0 single # TNT1 Methyl group 8 7 8 1.05 1.0 single 9 7 9 1.05 1.0 single 10 7 10 1.05 1.0 single 11 2 11 1.43 1.5 delocalized # TNT1 N2 nitro group 12 11 12 1.18 2.0 double 13 11 13 1.18 2.0 double 14 4 14 1.43 1.5 delocalized # TNT1 N4 nitro group 15 14 15 1.18 2.0 double 16 14 16 1.18 2.0 double 17 6 17 1.43 1.5 delocalized # TNT1 N6 nitro group 18 17 18 1.18 2.0 double 19 17 19 1.18 2.0 double # # The bonds in the second conformer follow. The bond lengths are the same as # in the first conformer. # 20 20 21 ? 1.5 delocalized # TNT2 Benzene ring 21 21 22 ? 1.5 delocalized 22 22 23 ? 1.5 delocalized 23 23 24 ? 1.5 delocalized 24 24 25 ? 1.5 delocalized 25 25 20 ? 1.5 delocalized 26 21 26 1.54 1.0 single # TNT2 Methyl group 27 26 27 1.05 1.0 single 28 26 28 1.05 1.0 single 29 26 29 1.05 1.0 single 30 21 30 1.43 1.5 delocalized # TNT2 N2 nitro group 31 30 31 1.18 2.0 double 32 30 32 1.18 2.0 double 33 23 33 1.43 1.5 delocalized # TNT2 N4 nitro group 34 33 34 1.18 2.0 double 35 33 35 1.18 2.0 double 36 25 36 1.43 1.5 delocalized # TNT2 N6 nitro group 37 36 37 1.18 2.0 double 38 36 38 1.18 2.0 double # # The bond angles follow. Again these are the same for both conformers. # Sufficient angles should be given to define the geometry uniquely. Probably # not enough angles are given in this example # loop_ _molecular_geom_angle_id # List reference _molecular_geom_angle_bond1_id # Child of _molecular_unit_bond_id _molecular_geom_angle_bond2_id # Child of _molecular_unit_bond_id _molecular_geom_angle_angle # Bond angle in degrees 1 8 9 109 # TNT1 Methyl group 2 8 10 109 3 9 10 109 4 7 8 109 5 7 9 109 6 7 10 109 7 11 12 117 # TNT1 N2 nitro group 8 11 13 117 9 12 13 125 10 14 15 117 # TNT1 N4 nitro group 11 14 16 117 12 15 16 125 13 17 18 117 # TNT1 N6 nitro group 14 17 19 117 15 18 19 125 # # The second conformer follows # 16 27 28 109 # TNT2 Methyl group 17 27 29 109 18 28 29 109 19 26 27 109 20 26 28 109 21 26 29 109 22 30 31 117 # TNT2 N2 nitro group 23 30 32 117 24 31 32 125 25 33 34 117 # TNT2 N4 nitro group 26 33 35 117 27 34 35 125 28 36 37 117 # TNT2 N6 nitro group 29 36 38 117 30 37 38 125 # # The two conformers, formed by taking different combinations of the two # disordered nitro groups, differ in their torsion angles. I have defined the # torsion angles in terms of the three bonds. *******Would it be better to # define them in terms of the four atoms? Is there an ambiguity about the # direction of the bonds? ******* # loop_ _molecule_geom_torsion_id # List reference _molecule_geom_torsion_bond1_id # Child of _molecule_geom_bond_id _molecule_geom_torsion_bond2_id # Child of _molecule_geom_bond_id _molecule_geom_torsion_bond3_id # Child of _molecule_geom_bond_id _molecule_geom_torsion_angle # Torsion angle in degrees 1 12 11 1 10.5 # TNT1 N2-N6 C1 conformer 2 11 1 2 0 3 1 2 17 0 4 2 17 18 10.5 5 31 30 20 10.5 # TNT2 N2-N6 Cs conformer 6 30 20 21 0 7 20 21 36 0 8 21 26 37 -10.5 # ############################################################ # MAPPING THE TOPOLOGIES ON TO EACH OTHER # # The next loop maps the atoms of molecular units 2 and 3 onto molecular unit # 1. This will not often be needed but is included to show that it is # possible. # It is only necessary to map the atoms, since there is no ambiguity # about where the bonds occur as long as the molecular_units are finite. # See the second example for an infinitely connected crystal. # # Since this mapping essentially equivalences two atoms, the order of the # _*_ids is not important. # loop_ _mol2mol_map_id_1 # List reference _mol2mol_map_atom_id_1 # Child of _molecular_unit_atom_id _mol2mol_map_atom_id_2 # Child of _molecular_unit_atom_id # 1 1 20 # mapping TNT onto the benzene ring 2 2 21 3 3 22 4 4 23 5 5 24 6 6 25 7 11 26 # mapping the TNT N2 group onto the nitro group 8 12 27 9 13 28 10 14 26 # mapping the TNT N4 group onto the nitro group 11 15 27 12 16 28 13 17 26 # mapping the TNT N6 group onto the nitro group 14 18 27 15 29 28 # ############################################################## # MAPPING THE CRYSTAL TO THE MOLECULAR UNITS # # The next loop maps the crystal to the two conformers. If the actual # conformer were not known the crystal structure could be mapped # directly to the molecular_unit using the item _mol2xtl_map_atom_mu_atom_id # in place of _mol2xtl_map_atom_mg_atom_id (See example 2). # # The crystallographic mirror operation that relates the two halves of the TNT # molecule is assumed to have the _space_group_symop_id of 2. Lattice # translations of the symmetry operations are not needed and are therefore not # included here (but see example 2). The letters a and b distinguish the two # disordered nitro groups in the crystal each having an occupancy of 0.5. The # mapping of two conformers onto the crystal is allowed provided their # occupation numbers do not exceed 1.0. However, in this version of the # proposal the occupation number is not given. ****** Should it go in the # molecular_geom loop? ****** # loop_ _mol2xtl_map_atom_id # List reference _mol2xtl_map_atom_mg_atom_id # child of _molecular_geom_atom_id _mol2xtl_map_atom_site_label # child of _atom_site_label _mol2xtl_map_atom_symop_id # child of _space_group_symop_id 1 1 C1 1 2 2 C2 1 3 3 C3 1 4 4 C4 1 5 5 C3 2 6 6 C2 2 7 7 C7 1 8 8 H71 1 9 9 H72 1 10 10 H71 2 11 14 N4 1 12 15 O41 1 13 16 O42 1 # # In the next six lines the N2 nitro group of the molecular_unit is mapped # onto the two disordered crystallographic nitro groups. Conformer 1 (C1) is # obtained by combining 'N2a 1' with 'N2b 2'. # 14 11 N2a 1 15 12 O21a 1 16 13 O22a 1 17 17 N2b 2 18 18 O21b 2 19 19 O22b 2 # # The second conformer (Cs) follows # 20 20 C1 1 21 21 C2 1 22 22 C3 1 23 23 C4 1 24 24 C3 2 25 25 C2 2 26 26 C7 1 27 27 H71 1 28 28 H72 1 29 29 H71 2 30 33 N4 1 31 34 O41 1 32 35 O42 1 # # In the next six lines the N2 and N6 nitro groups of conformer 2 are mapped # onto the crystal by combining 'N2a 1' with 'N2a 2' # 33 30 N2a 1 34 31 O21a 1 35 32 O22a 1 36 36 N2a 2 37 37 O21a 2 38 38 O22a 2 # # ############ End of first CIF ################ SECOND EXAMPLE -------------- CaCrF5 consists of chains of corner-linked CrF6 octahedral running along the c axis of a crystal belonging to space group C2/c. The Cr and the linking F atom (F3) reside on 2-fold axes that are perpendicular to c. The Ca atoms lie between the chains also on 2-fold axes. ######### Beginning of second CIF ############# # # # EXAMPLE OF A STRUCTURE WITH AN INFINITE BOND GRAPH # # CaCrF5 is chosen to illustrate how infinite graphs are treated. # # The crystal structure of CaCrF5 is represented by an array of atoms linked # by bonds into an infinitely connected network with translational symmetry. # A finite graph, which retains all the local properties of the atoms, can be # extracted from the infinite graph as follows: # one first extracts one formula unit (in this case the seven atoms in the # chemical formula). This requires that fourteen bonds linking the formula # unit to the rest of the infinite network be broken, but such broken bonds # always occur in pairs since they are necessarily related in pairs by one of # the translational symmetry operations of the space group (translations, # glides or screws). Each pair of broken bonds is then connected together, # adding (in this case) seven further bonds to the finite bond graph. # Therefore in some cases a pair of atoms in the graph will be linked by more # than one bond, indicated in the graph by a double or triple line, etc. In # CaCrF5 three such pairs of atoms are linked by two bonds as shown in the # bond graph below. The inclusion of two lines between a pair of atoms in the # graph does NOT indicate a double bond (a bond of order 2), but # rather two different bonds whose bond order is not specified. Where two (or # more) bonds are shown as linking the same two atoms in the finite graph, # they connect two different pairs of atoms in the infinite graph and the # crystal structure. # # Information on the long-range order is lost when the infinite graph is # reduced to a finite graph, but the short-range order, i.e., the nearest # neighbour environment that contains the chemical bonds, is preserved. # A crude representation of the finite graph showing the bonds between Cr and # F, and between Ca and F, is given below. In the crystal F1 and F4 are # related by crystallographic symmetry, as are F2 and F5. # # |------------ F2 -------------| # | | # |------------ F1 =============| # | | # Cr1 -|============ F3 -------------|- Ca # | | # |------------ F4 =============| # | | # |------------ F5 -------------| # # data_Ca_Cr_F5 # # In this example a complete CIF is given including the symmetry operations # and the atomic coordinates. The description of the molecular unit is # followed by the mapping between the molecular unit and the crystal # structure. As there is only one conformer, the crystal structure is mapped # directly to the topology in molecular_unit. # ############################################################### # DEFINITION OF THE CRYSTALLOGRAPHIC STRUCTURE # # Based on Wu and Brown (1973) Mat. Res. Bull. 8, 593-8. # _chemical_formula_sum 'Ca Cr F5' _cell_length_a 9.0050 _cell_length_b 6.4720 _cell_length_c 7.5330 _cell_angle_alpha 90.00 _cell_angle_beta 115.85 _cell_angle_gamma 90.00 _cell_formula_units_Z 8 _space_group_name_H-M_alt C2/c _space_group_name_Hall -C 2yc loop_ _space_group_symop_id _space_group_symop_operation_xyz 1 ' X, Y, Z' 2 '-X, Y,-Z+1/2' 3 '-X,-Y,-Z' 4 ' X,-Y, Z+1/2' 5 ' X+1/2, Y+1/2, Z' 6 '-X+1/2, Y+1/2,-Z+1/2' 7 '-X+1/2,-Y+1/2,-Z' 8 ' X+1/2,-Y+1/2, Z+1/2' loop_ _atom_site_label _atom_site_fract_x _atom_site_fract_y _atom_site_fract_z _atom_site_U_iso_or_equiv _atom_site_adp_type Ca1 0.50000 0.04260 0.25000 0.10000 Uiso Cr1 0.00000 0.00000 0.00000 0.10000 Uiso F1 0.00970 -0.29340 -0.02910 0.10000 Uiso F2 -0.22730 -0.02300 -0.11740 0.10000 Uiso F3 0.00000 -0.07210 0.25000 0.10000 Uiso # loop_ _geom_bond_atom_site_label_1 _geom_bond_atom_site_label_2 _geom_bond_distance _geom_bond_site_symmetry_1 _geom_bond_site_symmetry_2 Ca1 F1 2.391 1_555 5_555 Ca1 F1 2.391 1_555 6_555 Ca1 F1 2.292 1_555 7_545 Ca1 F1 2.292 1_555 8_545 Ca1 F2 2.215 1_555 3_555 Ca1 F2 2.215 1_555 4_655 Ca1 F3 2.494 1_555 5_555 Cr1 F1 1.918 1_555 1_555 Cr1 F1 1.918 1_555 3_555 Cr1 F2 1.848 1_555 1_555 Cr1 F2 1.848 1_555 3_555 Cr1 F3 1.940 1_555 1_555 Cr1 F3 1.940 1_555 3_555 # ############################################################# # DEFINITION OF THE FORMULA UNIT (IN MOLECULAR_UNIT) # # The next loop lists the molecular units, in this case the formula unit is # the only molecular unit defined. # loop_ _molecular_unit_id # list reference _molecular_unit_formula _molecular_unit_details 1 'Ca Cr F5' 'The formula unit' # # The next loop lists the seven atoms that compose the molecular unit and # gives their chemical properties. Note that the atom_site_ list in the # crystallographic items given above only contains five atoms because the # molecular unit occupies two asymmetric units. # loop_ _molecular_unit_atom_id # List reference _molecular_unit_atom_mu_id # Child of _molecular_unit_id _molecular_unit_atom_label # Optional _molecular_unit_atom_atom_type_symbol # Child of _atom_type_symbol _molecular_unit_atom_valence _molecular_unit_atom_coord_number _molecular_unit_atom_details 1 1 Ca1 Ca 2 7 ? 2 1 Cr1 Cr 3 6 ? 3 1 F1 F -1 3 ? 4 1 F2 F -1 2 ? 5 1 F3 F -1 3 ? 6 1 F4 F -1 3 ' Related to F1 by crystallographic symmetry' 7 1 F5 F -1 2 ' Related to F2 by crystallographic symmetry' # # The next loop lists the bonds in the molecular unit. Some bonds appear # twice (e.g. bonds numbered 5 and 6). The atoms of the molecular unit # specified in these cases (e.g., atoms 2 and 5 for bonds 5 and 6) map onto # different atom pairs in the crystal (see the bond mapping loop below). # # The bond order (more strictly the bond valence) given here is calculated # from the topology and is used to calculate the ideal bond lengths in # molecular_geom. # loop_ _molecular_unit_bond_id # list reference _molecular_unit_bond_atom_id_1 # Child of _molecular_unit_atom_id _molecular_unit_bond_atom_id_2 # Child of _molecular_unit_atom_id _molecular_unit_bond_order # Predicted bond valence _molecular_unit_bond_type 1 2 3 0.48 ? 2 2 6 0.48 ? 3 2 4 0.61 ? 4 2 7 0.61 ? 5 2 5 0.41 ? 6 2 5 0.41 ? 7 1 3 0.26 ? 8 1 3 0.26 ? 9 1 6 0.26 ? 10 1 6 0.26 ? 11 1 4 0.39 ? 12 1 7 0.39 ? 13 1 5 0.18 ? # ############################################################ # DEFINITION OF THE GEOMETRY OF THE MOLECULAR UNIT # # The atoms in the molecular_geom section are now defined. In this case the # definition is trivial and atomic coordinates are omitted as they are not # used to define the geometry of the molecular unit. # loop_ _molecular_geom_atom_id # List reference _molecular_geom_atom_mu_atom_id # Child of _molecular_unit_atom_id _molecular_geom_atom_mu_atom_details # Optional 1 1 Ca 2 2 Cr 3 3 F1 4 4 F2 5 5 F3 6 6 F4 7 7 F5 # # The bond distances predicted from the bond orders are given here. # They can be compared with the distances given in the crystallographic # _geom_bond list above. # loop_ _molecular_geom_bond_id # List reference _molecular_geom_bond_atom1_id # Child of _molecule_geom_atom_id _molecular_geom_bond_atom2_id # Child of _molecule_geom_atom_id _molecular_geom_bond_distance # Ideal bond distance in Angstroms _molecular_geom_bond_order _molecular_geom_bond_details 1 2 3 1.93 0.48 ? 2 2 6 1.93 0.48 ? 3 2 4 1.84 0.61 ? 4 2 7 1.84 0.61 ? 5 2 5 1.99 0.41 ? 6 2 5 1.99 0.41 ? 7 1 3 2.34 0.26 ? 8 1 3 2.34 0.26 ? 9 1 6 2.34 0.26 ? 10 1 6 2.34 0.26 ? 11 1 4 2.19 0.39 ? 12 1 7 2.19 0.39 ? 13 1 5 2.48 0.18 ? # # Similar angle and torsion loops could also be given but are omitted here for # brevity. # ############################################################ # MAPPING THE ATOMS ONTO THE CRYSTAL # # The next loop maps the atoms of the molecular unit onto the atoms of the # crystal. Note that atoms 6 and 7 (F4 and F5) in the molecular unit map onto # symmetry-generated copies of F1 and F2 in the crystal. # Alternatively the atoms of the molecular_geom, rather than the # molecular_unit, could have been mapped onto the crystal. # loop_ _mol2xtl_map_atom_id # List reference _mol2xtl_map_atom_mu_atom_id # child of _molecular_unit_atom_id _mol2xtl_map_atom_site_label # child of _atom_site_label _mol2xtl_map_atom_symop_id # child of _space_group_symop_id _mol2xtl_map_atom_trans_x _mol2xtl_map_atom_trans_y _mol2xtl_map_atom_trans_z 1 1 Ca1 1 0 0 0 2 2 Cr1 1 0 0 0 3 3 F1 1 0 0 0 4 4 F2 1 0 0 0 5 5 F3 1 0 0 0 6 6 F1 3 0 0 0 7 7 F2 3 0 0 0 # # The next loop maps the bonds from the molecular unit onto the crystal. This # is only needed for infinitely connected structures because these are the # only structures in which there can be more than one bond between the same # pair of atoms in the molecular unit. # # The bond in the molecular unit is identified here by its # _molecular_unit_bond_id, but the bond in the crystal must be defined fully # in terms of atom_site_labels and symmetry operations. The listing of bonds # in geom_bond cannot be used to identify the crystal bonds because the # molecular unit assumed in geom_bond is not necessarily the same as the # molecular unit assumed in molecular_geom_bond. # # For the same reasons, even though the observed bond distances are given in # geom_bond, they should be repeated here. They can be recalculated using the # _atom_site_labels and symmetry operations given in this loop. # # Note that the bonds numbered 5 and 6 map onto different pairs of atoms in # the crystal (see the bond list of the molecular_unit above). The bonds # labelled 'link' are those that link the chosen molecular unit (formula unit) # to other molecular units in the infinite graph, the remaining bonds are # formed between the atoms belonging to the molecular unit. # loop_ _mol2xtl_map_bond_id # List reference _mol2xtl_map_bond_mu_bond_id_1 # Child of _molecular_unit_bond_id _mol2xtl_map_bond_atom_site_label_1 # Child of _atom_site_label _mol2xtl_map_bond_symop_1 # Child of _space_group_symop_id _mol2xtl_map_bond_trans_x_1 _mol2xtl_map_bond_trans_y_1 _mol2xtl_map_bond_trans_z_1 _mol2xtl_map_bond_atom_site_label_2 # Child of _atom_site_label _mol2xtl_map_bond_symop_2 # Child of _space_group_symop_id _mol2xtl_map_bond_trans_x_2 _mol2xtl_map_bond_trans_y_2 _mol2xtl_map_bond_trans_z_2 _mol2xtl_map_bond_distance # Experimental distance in the crystal _mol2xtl_map_bond_details 1 1 Cr1 1 0 0 0 F1 1 0 0 0 1.918 ? 2 2 Cr1 1 0 0 0 F4 1 0 0 0 1.918 ? 3 3 Cr1 1 0 0 0 F2 1 0 0 0 1.848 ? 4 4 Cr1 1 0 0 0 F5 1 0 0 0 1.848 ? 5 5 Cr1 1 0 0 0 F3 1 0 0 0 1.940 ? 6 6 Cr1 1 0 0 0 F3 3 0 0 0 1.940 link 7 7 Ca1 1 0 0 0 F1 5 0 0 0 2.391 link 8 8 Ca1 1 0 0 0 F1 6 0 0 0 2.292 link 9 9 Ca1 1 0 0 0 F4 5 0 -1 0 2.391 link 10 10 Ca1 1 0 0 0 F4 6 0 -1 0 2.292 link 11 11 Ca1 1 0 0 0 F5 1 0 0 0 2.215 ? 12 12 Ca1 1 0 0 0 F2 4 1 0 0 2.215 link 13 13 Ca1 1 0 0 0 F3 5 0 0 0 2.494 link # ################# End of second CIF #################### COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF: mmCIF has a chemical description which is designed for biological molecules. The contents of the crystal are divided into a small number of ENTITIES which are classified as either polymers (e.g. a protein molecule), non-polymers, or water. A category called struct_asym describes which entities are found in the asymmetric unit. Polymeric entities are typically composed of monomers or COMPONENTS which are described in the category CHEM_COMP. The definitions in this set of categories are very similar to our definitions in the molecular_unit and molecular_geom categories. Chem_comp is designed to give the contents and geometries of the individual monomers that compose the macromolecules. It describes the ideal geometry of the monomers either in terms of Cartesian coordinates or in terms of bond lengths and angles. Unlike our proposal which uses mol2xtl_map to map the molecular units onto the crystal structure, the atom_site loop itself contains pointers to the corresponding atom in chem_comp, an arrangement that does not work for small molecules where a given atom in the crystal may map onto more than one atom in the molecular unit, e.g., if the molecular unit contains crystallographic symmetry. We should make the definitions of items in the molecular_unit and molecular_geom categories correspond exactly to those used in chem_comp to allow direct translation between the two categories. chem_comp defines a very large number of additional properties such as the chirality of individual atoms and planes of atoms, as well as properties that are of interest only in biological structures. We may wish to add some of these to our lists. PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org before AUGUST 31, 2004 ################# END OF PROPOSAL #5 ############################### -- Dr. I.D.Brown, Professor Emeritus, Department of Physics and Astronomy McMaster University, Hamilton Ontario, Canada _______________________________________________ coreCIFchem mailing list coreCIFchem@iucr.org http://scripts.iucr.org/mailman/listinfo/corecifchem
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Follow-Ups:
- Re: coreCIFchem #5 (Howard Flack)
- Re: coreCIFchem #5 (Howard Flack)
- Re: coreCIFchem #5 (Howard Flack)
- Prev by Date: Re: Discussion Paper #4 CORRECTION
- Next by Date: Re: coreCIFchem #5
- Prev by thread: Re: CoreCIFchem Discussion #6
- Next by thread: Re: coreCIFchem #5
- Index(es):