[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
Discussion Paper #4
- To: Chemical information in core CIF <corecifchem@iucr.org>
- Subject: Discussion Paper #4
- From: David Brown <idbrown@mcmaster.ca>
- Date: Thu, 04 Mar 2004 15:23:46 -0500
Dear Colleagues, There were only a few comments on Discussion Paper #3 posted on the coreCIFchem group. I take this to indicate general approval of the direction in which we are going in defining chemical information for inclusion in CIF. If you have reservations about this direction that you have not yet expressed, it is not too late to voice (or email) them. The present Discussion Paper (#4) builds on the proposals made in Discussion Paper #3. First I comment the questions raised by Greg Shields. Then, following an explanation of the changes between the previous proposal and the present one, I include two sample CIFs, the first to illustrate the treatment of disorder, the second the treatment of infinite graphs. Finally I compare our current approach with the way chemical concepts are treated in mmCIF so that we can decide to what extent we can make use of, or ensure compatibility with, the mmCIF dictionary. Can you please send your comments to the Discussion Group (by replying to this email) as soon as possible. I intend to start work on the next round after **** March 31, 2004 **** but if you need more time, let me know and I will wait for your response. If you want to print out this document it will require around 17 pages. COMMENTS RECEIVED ON DISCUSSION PAPER #3 GREG SHIELDS wrote: I have a question about how you are proposing to handle disordered structures in this framework? In disordered structures, an atom in the topological description may be mapped to one or more sites in the crystal description, so I am not sure that the graphs are isomorphous and a unique mapping exists, but perhaps I am misunderstanding something. IDB response: In the first round I wanted to make sure that we agreed on principles before tackling problems such as disorder, but I am glad that Greg raised this point because this is the right time to consider it. The answer is surprisingly simple: A disordered crystal contains two images of the molecule. They may be geometrically identical if the molecules are disordered around say a two-fold axis, or they may represent different conformers if say a methyl group is disordered over two geometrically different sites. Although the two images will normally have the same graph, we can treat them as two different components each having an occupation number less than 1.0; one of the components maps onto one set of disordered atoms and the other maps onto the second. Many of the atoms in the two graphs will map onto the same atom in the crystal, but this is not a problem provided the sum of their occupation numbers does not exceed 1.0. If the occupation numbers of the two independent molecules in the formula unit exceeds 1.0 they have to map into different parts of the unit cell (the case of Z' > 1). This approach also works for occupational disorder where the components contain different elements that map onto the same site in the crystal. The two partially-occupied component graphs in the formula unit may or may not be isomorphous. The implications of this are worked out in the proposal below. GREG: In disordered structures, do the symmetry theorems which you proposed necessarily hold in disordered structures - is it possible that a molecule may have a lower symmetry order than the site symmetry which it occupies in such cases ? IDB: There is an important distinction to be made between the crystallographic site symmetry of an individual molecule and the site symmetry implied by the space group of the crystal as a whole, that is between the microscopic and the macroscopic symmetry which are not necessarily the same. At low temperatures a structure is usually fully ordered, but as it is heated the symmetry may increase as a result of thermal motion. These additional symmetry elements can be thought of as entropic symmetry because they result from a loss of long range order, leading to a crystal in which the atoms are disordered. For example, at low temperatures, the Ti atoms in BaTiO3 are displaced from the centers of their coordination octahedra in a cooperative way having long range order. As the crystals are heated their orthorhombic symmetry eventually changes to a cubic symmetry which requires that the Ti atoms lie (on average) at the centers of their octahedra. However, careful examination shows that the Ti atoms are still displaced but the long-range static order present at low temperature has been lost. The local symmetry is still orthorhombic, but heating results in the Ti atoms being uniformly disordered over a sphere surrounding the center of the octahedron. Entropic symmetry is a macroscopic, but not a microscopic, property of the crystal. The crystallographic site symmetry that is relevant to a discussion of the graph does not contain elements of the entropic symmetry. Any disorder related to entropic symmetry should be discounted in discussing the site symmetry of the molecule. WHAT IS NEW IN VERSION #4 In this proposal I have developed the philosophy outlined in Discussion Paper #3 by introducing new items and changing some names to reflect a tighter organization of the material. The new proposal is illustrated with two sample CIFs, one showing how disorder is handled, the other how infinite graphs are treated. In the previous discussion I raised two questions. The first was whether the graph of the formula unit should be in a separate loop from the graphs of the remaining molecular units. This separation of the topology into a formula_unit loop and a molecular_unit loop is adopted in the sample CIFs given below. The second question was whether the geometry should be combined with the description of the graph of the molecular_units and this too has been adopted. The chemical information is now organized into loops arranged into three groups: The first group describes the FORMULA_UNIT, defined as a small group of atoms that contains all the chemical elements in the same proportions as they are found in the crystal. Normally it will be the smallest group that can be described with integer multipliers, but in cases where this is not possible, e.g., in non-stoichiometric crystals, the size of the formula unit is arbitrary. The formula_unit will necessarily be at least as large as the asymmetric unit of the crystal and normally no larger than the primitive unit cell. The graph of the formula_unit is chosen to be isomorphous with the finite graph (defined below) of the crystal. The formula_unit may contain two or more COMPONENTS whose graphs are disjoint (unconnected). The components will normally correspond to identifiable chemical units such as molecules, complex ions or radicals. They may have partial occupancy to allow for the description of disorder, and they may overlap in that two different atoms in the formula_unit map onto the same atom in the crystal provided that their total occupation number does not exceed 1.0. Three loops are defined in this group, the first identifying the different components, the second listing the atoms in each component and the third listing the bonds. FORMULA_UNIT FORMULA_UNIT_ATOM FORMULA_UNIT_BOND The second group describes the MOLECULAR_UNITS. These are normally chemical units that represent portions of the components of the formula_unit. There can be as few or as many of these as desired and they may be nested, e.g., dimethylsulfone may be one molecular unit, a methyl group may be another, and the C and H atoms from which the methyl group is built may be two more. A notional geometry can be defined for each molecular_unit, either by giving atomic coordinates in an arbitrary Cartesian coordinate system, or by giving bond lengths and angles. This notional geometry is not the geometry found in the crystal, but an idealized geometry which could be used for comparison with the observed geometry or as a target during refinement. The loops in this group are similar to those in the formula_unit except that the loop defining the atoms may include coordinates, the loop defining the bonds may include bond lengths and an additional _*_angle loop may be given. MOLECULAR_UNIT MOLECULAR_UNIT_ATOM MOLECULAR_UNIT_BOND MOLECULAR_UNIT_ANGLE The third group of loops is the one that describes the mappings between the components of the formula_unit, the molecular_units and the crystal. There are four loops in this group. The first allows mappings between different components in the formula unit when, e.g., there are two or more crystallographically distinct but chemically identical molecules in the unit cell (Z' > 1, or disordered molecules). The second loop maps between different molecular units and is used when the molecular units are nested. The third loop maps between the components of the formula unit and the molecular units and allows identification of significant chemical sub-units (molecular_units) in the different components. Finally there is a mapping between the atoms in the formula_unit and the atoms in the crystal. A possible alternative approach to giving the information in the last two loops is described at the very end of this Discussion Paper and should be given serious thought. MAPPING_FU2FU MAPPING_MU2MU MAPPING_FU2MU MAPPING_FU2XTL The first CIF given below describes two (disordered) components, TNT1 and TNT2, of the TNT molecule. In the fictitious crystal structure I have invented for the purposes of this illustration, there is a mirror plane perpendicular to the benzene ring that includes the methyl group and the N4 nitrite group. The N2 and N6 nitrite groups are disordered with the two components having occupation numbers of 0.5. I have also assumed that each component has crystallographic symmetry 1, so that the two components are related by the crystallographic mirror plane. This is, of course, not the only possible chemical interpretation allowed by the crystal structure - another possibility is that the individual molecules might have m site symmetry but different conformations, it all depends on how one chooses to correlate the two disordered groups. The second CIF describes NaCl as an example of a crystal with an infinite bond graph. The sample CIFs are intended only to show the organization of the information. Data names may change and dictionary definitions will eventually be needed but it is simpler to discuss the CIF structure in terms of concrete examples. ############# Beginning of first CIF ############# # # data_disordered_TNT # # FORMULA_UNIT ITEMS # # The description of the components of the formula unit could # include other properties such as the molecular mass, formal # charge, chirality etc. # loop_ _formula_unit_id # Unique component identifier _formula_unit_name # Optional _formula_unit_formula _formula_unit_point_group _formula_unit_occupation_number _formula_unit_details 1 TNT1 'C7 H5 N3 O6' 1 0.5 ; first component composed of a benzene ring, methyl and three nitro groups. ; 2 TNT2 'C7 H5 N3 O6' 1 0.5 ; second component composed of a benzene ring, methyl and three nitro groups related to the first by a crystallographic mirror plane. ; # loop_ _formula_unit_atom_fu_id # Child of _formula_unit_id _formula_unit_atom_label #Parent of _formula_unit_bond_atom_label _formula_unit_atom_element # Child of _atom_type_symbol _formula_unit_atom_valence _formula_unit_atom_coord_number _formula_unit_atom_details # # The first two items together constitute the unique list # reference. # # Note that the atoms in the two partial formula units can have # the same labels because they are differentiated by their # _*_fu_ids. # 1 C1 C 4 3 ? 1 C2 C 4 3 ? 1 C3 C 4 3 ? 1 C4 C 4 3 ? 1 C5 C 4 3 ? 1 C6 C 4 3 ? 1 C7 C 4 4 ? 1 H3 H 1 1 ? 1 H5 H 1 1 ? 1 H71 H 1 1 ? 1 H72 H 1 1 ? 1 H73 H 1 1 ? 1 N2 N 3 3 ? 1 O21 O 2 1 ? 1 O22 O 2 1 ? 1 N4 N 3 3 ? 1 O41 O 2 1 ? 1 O42 O 2 1 ? 1 N6 N 3 3 ? 1 O61 O 2 1 ? 1 O62 O 2 1 ? 2 C1 C 4 3 ? 2 C2 C 4 3 ? 2 C3 C 4 3 ? 2 C4 C 4 3 ? 2 C5 C 4 3 ? 2 C6 C 4 3 ? 2 C7 C 4 4 ? 2 H3 H 1 1 ? 2 H5 H 1 1 ? 2 H71 H 1 1 ? 2 H72 H 1 1 ? 2 H73 H 1 1 ? 2 N2 N 3 3 ? 2 O21 O 2 1 ? 2 O22 O 2 1 ? 2 N4 N 3 3 ? 2 O41 O 2 1 ? 2 O42 O 2 1 ? 2 N6 N 3 3 ? 2 O61 O 2 1 ? 2 O62 O 2 1 ? # loop_ _formula_unit_bond_id # List reference _formula_unit_bond_fu_id # Child of _formula_unit_id _formula_unit_bond_atom_label_1 # Child of # _formula_unit_atom_label _formula_unit_bond_atom_label_2 # Ditto _formula_unit_bond_type 1 1 C1 C2 delocalized 2 1 C2 C3 delocalized 3 1 C3 C4 delocalized 4 1 C4 C5 delocalized 5 1 C5 C6 delocalized 6 1 C6 C1 delocalized 7 1 C1 C7 single 8 1 C3 H3 single 9 1 C5 H5 single 10 1 C7 H71 single 11 1 C7 H71 single 12 1 C7 H71 single 13 1 C2 N2 single 14 1 N2 O21 double 15 1 N2 O22 double 16 1 C4 N4 single 17 1 N4 O41 double 18 1 N4 O42 double 19 1 C6 N6 single 20 1 N6 O61 double 21 1 N6 O62 double 22 2 C1 C2 delocalized 23 2 C2 C3 delocalized 24 2 C3 C4 delocalized 25 2 C4 C5 delocalized 26 2 C5 C6 delocalized 27 2 C6 C1 delocalized 28 2 C1 C7 single 29 2 C3 H3 single 30 2 C5 H5 single 31 2 C7 H71 single 32 2 C7 H71 single 33 2 C7 H71 single 34 2 C2 N2 single 35 2 N2 O21 double 36 2 N2 O22 double 37 2 C4 N4 single 38 2 N4 O41 double 39 2 N4 O42 double 40 2 C6 N6 single 41 2 N6 O61 double 42 2 N6 O62 double # # MOLECULAR_UNIT ITEMS # # The following loops may be omitted if there is no interest in # defining molecular_units. We may wish to define other # properties of the molecular units besides those given below. # loop_ _molecular_unit_id # Parent to _mu_id items, list ref. _molecular_unit_name _molecular_unit_formula _molecular_unit_point_group _molecular_unit_details 1 'benzene ring' 'C6 H2' 2mm ? 2 'methyl group' 'C H3' 3m ? 3 'nitro group' 'N O2' 2mm ? 4 carbon C ? 'single atom' 5 hydrogen H ? 'single atom' # # The geometry specified for the molecular units is a notional # ideal geometry, not the one observed in the crystal. The # geometry may be given using either atomic coordinates or bond # lengths and angles. It may be used as a constraint (or # restraint) in the refinement of the crystal structure, in which # case there must be pointers in the atom_site loop to the # appropriate molecular_unit. See also the note at the very end # of this Discussion Paper. # # The basis set for the coordinates is given in Angstroms # but the orientation is arbitrary. How do we convey information # about the transformation between this basis and the crystal # axes given that a molecular unit might map onto several groups # in the crystal? # loop_ _molecular_unit_atom_mu_id # Child of _molecular_unit_id _molecular_unit_atom_label # Parent of _molecular_unit_bond_atom_label etc. _molecular_unit_atom_element # Child of _atom_type_symbol _molecular_unit_atom_valence _molecular_unit_atom_coord_number _molecular_unit_atom_coord_x _molecular_unit_atom_coord_y _molecular_unit_atom_coord_z _molecular_unit_atom_details # # The first two items constitute the list reference. # # The benzene ring is defined by atomic coordinates, the other # molecular_units are defined by their bonds and angles. # 1 C1 C 4 3 0.037 0.146 -0.124 Benzene 1 C2 C 4 3 1.378 0.562 0.134 Benzene 1 C3 C 4 3 1.846 1.421 0.204 Benzene 1 C4 C 4 3 2.567 1.834 0.304 Benzene 1 C5 C 4 3 1.745 1.563 0.245 Benzene 1 C6 C 4 3 0.962 0 498 0.103 Benzene 2 C1 C 4 4 ? ? ? 'Methyl group' 2 H1 H 1 1 ? ? ? 'Methyl group' 2 H2 H 1 1 ? ? ? 'Methyl group' 2 H3 H 1 1 ? ? ? 'Methyl group' 3 N1 N 3 3 ? ? ? 'Nitrate group' 3 O1 O 2 1 ? ? ? 'Nitrate group' 3 O2 O 2 1 ? ? ? 'Nitrate group' # # Single atoms are assumed to be at the origin # 4 H H 1 1 0 0 0 'single atom' 5 C C 4 4 0 0 0 'single atom' # loop_ _molecular_unit_bond_mu_id _molecular_unit_bond_atom_label_1 _molecular_unit_bond_atom_label_2 _molecular_unit_bond_length _molecular_unit_bond_order _molecular_unit_bond_type 3 C1 C2 ? 1.5 delocalized 3 C2 C3 ? 1.5 delocalized 3 C3 C4 ? 1.5 delocalized 3 C4 C5 ? 1.5 delocalized 3 C5 C6 ? 1.5 delocalized 3 C6 C1 ? 1.5 delocalized 4 C1 H1 1.05 1.0 single 4 C1 H2 1.05 1.0 single 4 C1 H3 1.05 1.0 single 5 N1 O1 1.18 2.0 double 5 N1 O2 1.18 2.0 double # loop_ _molecular_unit_angle_mu_id _molecular_unit_angle_atom_label_1 _molecular_unit_angle_atom_label_2 _molecular_unit_angle_atom_label_3 _molecular_unit_angle_angle 4 H1 C1 H2 109 4 H1 C1 H3 109 4 H2 C1 H3 109 5 O1 N1 O2 125 # # THE NEXT SET OF LOOPS DESCRIBES THE MAPPINGS # # It is only necessary to give the minimum number of # mappings needed to indicate the isomorphisms present in the # various graphs. Additional isomorphisms can be generated by # combining two or more mappings. # loop_ _mapping_fu2fu_id_1 _mapping_fu2fu_atom_label_1 _mapping_fu2fu_id_2 _mapping_fu2fu_atom_label_2 # # Mapping the two components onto each other. # 1 C1 2 C1 1 C2 2 C2 1 C3 2 C3 1 C4 2 C4 1 C5 2 C5 1 C6 2 C6 1 H3 2 H3 1 H5 2 H5 1 C7 2 C7 1 H71 2 H71 1 H72 2 H72 1 H73 2 H73 1 N2 2 N2 1 O21 2 O21 1 O22 2 O22 1 N4 2 N4 1 O41 2 O41 1 O42 2 O42 1 N6 2 N6 1 O61 2 O61 1 O62 2 O62 # loop_ _mapping_mu2mu_1 _mapping_mu2mu_atom_label_1 _mapping_mu2mu_2 _mapping_mu2mu_atom_label_2 # # The one C and three H atoms are mapped onto the methyl group # 3 C1 6 C 3 H1 5 H 3 H2 5 H 3 H3 5 H # loop_ _mapping_fu2mu_fu_id _mapping_fu2mu_fu_atom_label _mapping_fu2mu_mu_id _mapping_fu2mu_mu_atom_label # # The molecular units are mapped onto one component of the # formula unit. Mapping to the other component follows from the # isomorphism of the two components and is not explicitly shown. # 1 C1 3 C1 # Benzene ring of component 1 1 C2 3 C2 1 C3 3 C3 1 C4 3 C4 1 C5 3 C5 1 C6 3 C6 1 H3 6 H 1 H5 6 H 1 C7 4 C1 # Methyl group of component 1 1 H71 4 H1 1 H72 4 H2 1 H73 4 H3 1 N2 5 N1 # Three nitrite groups of component 1 1 O21 5 O1 1 O22 5 O2 1 N4 5 N1 1 O41 5 O1 1 O42 5 O2 1 N6 5 N1 1 O61 5 O1 1 O62 5 O2 # # The crystallographic mirror operation perpendicular to the # benzene ring is denoted by the _*_symop_id value of 2. # In this description it is assumed that each # component (partial molecule) contains no mirror plane so the # crystallographic mirror plane perpendicular to the benzene ring # relates the two components. # # Note that the molecular_units are not explicitly mapped on to # the crystal but their mapping can be deduced by combining other # mappings that are given. # loop_ _mapping_fu2xtl_fu_component_id _mapping_fu2xtl_fu_atom_label _mapping_fu2xtl_crystal_atom_site_label _mapping_fu2xtl_crystal_symop_id # # The two components are mapped onto the atom_sites of the # crystal. Note that the disordered atoms in the crystal carry # suffixes A and B. C1, C7, H71 and N4 lie on the # crystallographic mirror plane. # 1 C1 C1 1 1 C2 C2 1 1 C3 C3 1 1 C4 C4 1 1 C5 C3 2 1 C6 C2 2 1 H3 H3 1 1 H5 H3 2 1 C7 C7 1 1 H71 H71 1 1 H72 H72 1 1 H73 H72 2 1 N2 N2A 1 1 O21 O21A 1 1 O22 O22A 1 1 N4 N4 1 1 O41 O41 1 1 O42 O41 2 1 N6 N2B 2 1 O61 O21B 2 1 O62 O22B 2 2 C1 C1 1 2 C2 C2 2 2 C3 C3 2 2 C4 C4 1 2 C5 C3 1 2 C6 C2 1 2 H3 H3 2 2 H5 H3 1 2 C7 C7 1 2 H71 H71 1 1 H72 H72 2 2 H73 H72 1 2 N2 N2B 2 2 O21 O21A 2 2 O22 O22A 2 2 N4 N4 1 2 O41 O41 2 2 O42 O41 1 2 N6 N2A 1 2 O61 O21B 1 2 O62 O22B 1 # # ############ End of first CIF ################ ######### Beginning of second CIF ############# # # # EXAMPLE OF A STRUCTURE WITH AN INFINITE BOND GRAPH # # NaCl is chosen to illustrate how infinite graphs are treated. # The infinite graph of NaCl can be represented by the cubic # lattice of the NaCl crystal with every Na atom forming a bond # to its six Cl nearest neighbours and vice versa. To reduce # this graph to a finite graph, first extract one formula unit # (the two atoms Na and Cl). To extract these atoms, ten bonds # must be broken, five bonds from Na and five bonds from Cl (one # bond between Na and Cl remains intact. The # broken bonds occur in pairs (one from Na and one from # Cl) related by the translational symmetry of the graph. To # complete the finite graph the two broken bonds of # each pair are joined together, resulting in a finite graph # which, for NaCl, consists of two atoms, Na and Cl, linked by # six different bonds. This is the graph described in the # following sections. The long-range order of the infinite graph # is lost, but the short-range order, i.e., the nearest neighbour # environment that represents the chemical interactions, is # preserved. # data_nacl # loop_ _formula_unit_id _formula_unit_name _formula_unit_formula _formula_unit_details # # There is only one component in this graph. # nacl 'sodium chloride' 'Na Cl' 'Only one component' loop_ _formula_unit_atom_id # List reference _formula_unit_atom_fu_id _formula_unit_atom_label _formula_unit_atom_element _formula_unit_atom_valence _formula_unit_atom_coord_number _formula_unit_atom_details 1 nacl Na1 Na +1 6 ? 2 nacl Cl1 Cl -1 6 ? loop_ _formula_unit_bond_fu_id # Child of _formula_unit_id _formula_unit_bond_id # List reference _formula_unit_bond_fu_atom_label_1 _formula_unit_bond_fu_atom_label_2 _formula_unit_bond_fu_bond_valence # Equivalent of bond strength nacl 1 Na1 Cl1 0.167 nacl 2 Na1 Cl1 0.167 nacl 3 Na1 Cl1 0.167 nacl 4 Na1 Cl1 0.167 nacl 5 Na1 Cl1 0.167 nacl 6 Na1 Cl1 0.167 # # In this example molecular_units are not used. # loop_ _mapping_fu2xtl_fu_atom_id _mapping_fu2xtl_fu_atom_label _mapping_fu2xtl_crystal_atom_site_label _mapping_fu2xtl_crystal_symop_id # nacl Na1 Na 1 # Both of these atoms are chosen from the nacl Cl1 Cl 1 # same asymmetric unit of the crystal # # Because there is, in general, more than one bond between the # same two atoms in the finite graph, it is necessary to indicate # the mapping of the bonds as well. This requires that the six # bonds linking the two atoms in the graph of the formula_unit # be mapped onto different bonds in the crystal, requiring # that the symmetry operation be given for at least one the two # atoms defining the bond. This example shows the mappings of # the bonds of the formula_unit onto the bonds formed by the Na # atom in the asymmetric unit. # loop_ _mapping_fu2xtl_bond_fu_id # Child of _formula_unit_id _mapping_fu2xtl_bond_fu_bond_id # Child of _formula_unit_bond_id _mapping_fu2xtl_bond_fu_atom_label_1 # Redundant information _mapping_fu2xtl_bond_fu_atom_label_2 # Redundant information _mapping_fu2xtl_bond_crystal_atom_site_label_1 _mapping_fu2xtl_bond_crystal_symop_1 _mapping_fu2xtl_bond_crystal_trans_x_1 _mapping_fu2xtl_bond_crystal_trans_y_1 _mapping_fu2xtl_bond_crystal_trans_z_1 _mapping_fu2xtl_bond_crystal_atom_site_label_2 _mapping_fu2xtl_bond_crystal_symop_2 _mapping_fu2xtl_bond_crystal_trans_x_2 _mapping_fu2xtl_bond_crystal_trans_y_2 _mapping_fu2xtl_bond_crystal_trans_z_2 # # The second item is the list reference which makes the listing # of the labels 'Na1' and 'Cl1' redundant but visually helpful. # nacl 1 Na1 Cl1 Na 1 0 0 0 Cl 1 0 0 0 nacl 2 Na1 Cl1 Na 1 0 0 0 Cl 3 0 0 0 nacl 3 Na1 Cl1 Na 1 0 0 0 Cl 5 0 0 1 nacl 4 Na1 Cl1 Na 1 0 0 0 Cl 7 0 0 -1 nacl 5 Na1 Cl1 Na 1 0 0 0 Cl 9 0 0 0 nacl 6 Na1 Cl1 Na 1 0 0 0 Cl 11 0 1 0 # ############# End of second CIF #################### COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF: mmCIF has a chemical description which is similar to the one we are proposing for coreCIF, though there are important differences. The rough equivalence is as follows: Our proposal mmCIF ----------- -------------- formula_unit struct_asym component entity molecular_unit chem_comp The formula_unit and struct_asym both describe a number of components (or entities) that together represent the total contents of the crystal. They differ in the following respects: 1. The formula_unit and the asymmetric unit are, in general, not the same. The formula unit normally contains several asymmetric units if, e.g., the it contains crystallographic symmetry and thus spans two or more asymmetric units. The chemical formula of the asymmetric unit may have non-integral multiplicities for the elements if an atom lies on a special position. 2. struct_asym lists the entities (equivalent to the components) which form the crystal, but mmCIF does not include their graphs. Instead the atoms in the crystal that map onto the entity are flagged with the entity_id in the atom_site category. 3. While the sum of all the components in coreCIF corresponds to the complete formula_unit, in mmCIF identical molecules in the asymmetric unit are represented by a single entity which is repeated in struct_asym as many times as it appears in the asymmetric unit. In the proposal for the formula_unit above every component, whether identical or not, is separately listed. For example, if the TNT example above were described in mmCIF, the two isomorphous components would be represented by a single entity which would be listed twice in struct_asym. PROBLEMS: The link between the entity and the crystal is provided by labelling each atom in the atom_site loop with a pointer to the entity to which it belongs. This means that each atom can belong only to one entity, making it impossible to treat disorder in the way described above. Further since the atoms and bonds in the entities are not separately enumerated, it may be more difficult to assign chemical properties to them. In mmCIF ENTITIES are classified as one of three types: polymer, non-polymer and water. Polymers are considered to be composed of amino acids, nucleic acids or non-standard residues and are defined in terms of the sequence of the monomers described in the chem_comp loops. CHEM_COMP (= chemical component) is close to our definition of a molecular_unit. It is designed to give the contents and geometries of the individual monomeric units that compose the macromolecules. It describes the ideal geometry of the monomer either in terms of Cartesian coordinates or in terms of bond lengths and angles. The individual atoms of the monomer are (necessarily) labelled, normally with some standard chemical (as opposed to a crystallographic) labelling scheme. The atom_site loop contains a pointer to the particular atom in the _chem_comp that the atom in the crystal corresponds to. CONCLUSION: The mmCIF description of the chemistry is tailored to the needs of macromolecular crystallography, particularly to the descriptions of molecules that are composed of a sequence of standard monomeric units. It is not obvious that the categories defined for this purpose could easily be adapted to small-cell structures, particularly structures that are infinitely connected. We have three choices: 1. We can try to use as many of the definitions in mmCIF as possible, defining additional terms as needed and possibly extending the definitions given in mmCIF, 2. We can try to follow the philosophy of mmCIF in terms of the way in which the chemistry is described, i.e., not giving the graphs of the formula_unit and molecular_units and defining isomorphous components in the formula_unit only once, or 3. We can devise our own scheme with minimal reference to mmCIF (but, e.g., we might wish to ensure that the properties of molecular_units can be mapped onto the properties given in chem_comp). In practical terms this might mean making greater use of parent child relations instead of the mapping_fu2mu and mapping_fu2xtl loops described above. For example the mapping_fu2xtl loop would not be needed if the formula_unit_atom loop included pointers to _atom_site_label and _space_group_symop. We might also consider making the component of the formula_unit identical to the entity (while ignoring many of the macroscopic entity properties that are irrelevant and including instead the bond graph). I would reject option #1 as inappropriate. #2 would restrict our ability to describe a chemical molecule independently of the crystal structure. #3 is a possibility we should serious consider. PLEASE SEND YOUR COMMENTS TO CORECIFCHEM@IUCR.ORG _______________________________________________ coreCIFchem mailing list coreCIFchem@iucr.org http://scripts.iucr.org/mailman/listinfo/corecifchem
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Follow-Ups:
- Re: Discussion Paper #4 CORRECTION (Howard Flack)
- Re: Discussion Paper #4 (Howard Flack)
- Re: Discussion Paper #4 (Greg P Shields)
- Prev by Date: Re: coreCIFchem Discussion #3
- Next by Date: Re: Discussion Paper #4
- Prev by thread: Discussuon Paper #4
- Next by thread: Re: Discussion Paper #4
- Index(es):