Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Discussion Paper #4

  • To: Chemical information in core CIF <corecifchem@iucr.org>
  • Subject: Discussion Paper #4
  • From: David Brown <idbrown@mcmaster.ca>
  • Date: Thu, 04 Mar 2004 15:23:46 -0500
Dear Colleagues,

    There were only a few comments on Discussion Paper #3 posted
on the coreCIFchem group.  I take this to indicate general
approval of the direction in which we are going in defining
chemical information for inclusion in CIF.  If you have
reservations about this direction that you have not yet
expressed, it is not too late to voice (or email) them. 

    The present Discussion Paper (#4) builds on the proposals
made in Discussion Paper #3.  First I comment the questions
raised by Greg Shields.  Then, following an explanation of the
changes between the previous proposal and the present one, I
include two sample CIFs, the first to illustrate the treatment of
disorder, the second the treatment of infinite graphs.  Finally I
compare our current approach with the way chemical concepts are
treated in mmCIF so that we can decide to what extent we can make
use of, or ensure compatibility with, the mmCIF dictionary.

    Can you please send your comments to the Discussion Group
(by replying to this email) as soon as possible.  I intend to
start work on the next round after **** March 31, 2004 **** but if you need
more time, let me know and I will wait for your response.  If you
want to print out this document it will require around 17 pages.

        COMMENTS RECEIVED ON DISCUSSION PAPER #3

GREG SHIELDS wrote:
I have a question about how you are proposing to handle
disordered structures in this framework? In disordered
structures, an atom in the topological description may be mapped
to one or more sites in the crystal description, so I am not sure
that the graphs are isomorphous and a unique mapping exists, but
perhaps I am misunderstanding something.

IDB response:
In the first round I wanted to make sure that we agreed on
principles before tackling problems such as disorder, but I am
glad that Greg raised this point because this is the right time
to consider it.  The answer is surprisingly simple: A disordered
crystal contains two images of the molecule.  They may be
geometrically identical if the molecules are disordered around
say a two-fold axis, or they may represent different conformers
if say a methyl group is disordered over two geometrically
different sites.  Although the two images will normally have the
same graph, we can treat them as two different components each
having an occupation number less than 1.0; one of the components
maps onto one set of disordered atoms and the other maps onto the
second.  Many of the atoms in the two graphs will map onto the
same atom in the crystal, but this is not a problem provided the
sum of their occupation numbers does not exceed 1.0.  If the
occupation numbers of the two independent molecules in the
formula unit exceeds 1.0 they have to map into different parts of
the unit cell (the case of Z' > 1).  This approach also works for
occupational disorder where the components contain different
elements that map onto the same site in the crystal. The two
partially-occupied component graphs in the formula unit may or
may not be isomorphous.  The implications of this are worked out
in the proposal below.

GREG:
In disordered structures, do the symmetry theorems which you
proposed necessarily hold in disordered structures - is it
possible that a molecule may have a lower symmetry order than the
site symmetry which it occupies in such cases ?

IDB:
There is an important distinction to be made between the
crystallographic site symmetry of an individual molecule and the
site symmetry implied by the space group of the crystal as a
whole, that is between the microscopic and the macroscopic
symmetry which are not necessarily the same.  At low temperatures
a structure is usually fully ordered, but as it is heated the
symmetry may increase as a result of thermal motion.  These
additional symmetry elements can be thought of as entropic
symmetry because they result from a loss of long range order,
leading to a crystal in which the atoms are disordered.  For
example, at low temperatures, the Ti atoms in BaTiO3 are
displaced from the centers of their coordination octahedra in a
cooperative way having long range order.  As the crystals are
heated their orthorhombic symmetry eventually changes to a cubic
symmetry which requires that the Ti atoms lie (on average) at the
centers of their octahedra.  However, careful examination shows
that the Ti atoms are still displaced but the long-range static
order present at low temperature has been lost.  The local
symmetry is still orthorhombic, but heating results in the Ti
atoms being uniformly disordered over a sphere surrounding the
center of the octahedron.  Entropic symmetry is a macroscopic,
but not a microscopic, property of the crystal.  The
crystallographic site symmetry that is relevant to a discussion
of the graph does not contain elements of the entropic symmetry.
Any disorder related to entropic symmetry should be discounted in
discussing the site symmetry of the molecule.

                 WHAT IS NEW IN VERSION #4 

In this proposal I have developed the philosophy outlined in
Discussion Paper #3 by introducing new items and changing some
names to reflect a tighter organization of the material.  The new
proposal is illustrated with two sample CIFs, one showing how
disorder is handled, the other how infinite graphs are treated.

In the previous discussion I raised two questions.  The first was
whether the graph of the formula unit should be in a separate
loop from the graphs of the remaining molecular units.  This
separation of the topology into a formula_unit loop and a
molecular_unit loop is adopted in the sample CIFs given below.
The second question was whether the geometry should be combined
with the description of the graph of the molecular_units and this
too has been adopted.

The chemical information is now organized into loops arranged
into three groups:

The first group describes the FORMULA_UNIT, defined as a small
group of atoms that contains all the chemical elements in the
same proportions as they are found in the crystal.  Normally it
will be the smallest group that can be described with integer
multipliers, but in cases where this is not possible, e.g., in
non-stoichiometric crystals, the size of the formula unit is
arbitrary.  The formula_unit will necessarily be at least as
large as the asymmetric unit of the crystal and normally no
larger than the primitive unit cell.  The graph of the
formula_unit is chosen to be isomorphous with the finite graph
(defined below) of the crystal.  The formula_unit may contain two
or more COMPONENTS whose graphs are disjoint (unconnected).  The
components will normally correspond to identifiable chemical
units such as molecules, complex ions or radicals.  They may have
partial occupancy to allow for the description of disorder, and
they may overlap in that two different atoms in the formula_unit
map onto the same atom in the crystal provided that their total
occupation number does not exceed 1.0.

Three loops are defined in this group, the first identifying the
different components, the second listing the atoms in each
component and the third listing the bonds.

    FORMULA_UNIT
    FORMULA_UNIT_ATOM
    FORMULA_UNIT_BOND

The second group describes the MOLECULAR_UNITS.  These are
normally chemical units that represent portions of the components
of the formula_unit.  There can be as few or as many of these as
desired and they may be nested, e.g., dimethylsulfone may be one
molecular unit, a methyl group may be another, and the C and H
atoms from which the methyl group is built may be two more.  A
notional geometry can be defined for each molecular_unit, either
by giving atomic coordinates in an arbitrary Cartesian coordinate
system, or by giving bond lengths and angles.  This notional
geometry is not the geometry found in the crystal, but an
idealized geometry which could be used for comparison with the
observed geometry or as a target during refinement.  The loops in
this group are similar to those in the formula_unit except that
the loop defining the atoms may include coordinates, the loop
defining the bonds may include bond lengths and an additional
_*_angle loop may be given.

    MOLECULAR_UNIT
    MOLECULAR_UNIT_ATOM
    MOLECULAR_UNIT_BOND
    MOLECULAR_UNIT_ANGLE

The third group of loops is the one that describes the mappings
between the components of the formula_unit, the molecular_units
and the crystal.  There are four loops in this group.  The first
allows mappings between different components in the formula unit
when, e.g., there are two or more crystallographically distinct
but chemically identical molecules in the unit cell (Z' > 1, or
disordered molecules).  The second loop maps between different
molecular units and is used when the molecular units are nested.
The third loop maps between the components of the formula unit
and the molecular units and allows identification of significant
chemical sub-units (molecular_units) in the different components.
Finally there is a mapping between the atoms in the formula_unit
and the atoms in the crystal.  A possible alternative approach to
giving the information in the last two loops is described at the
very end of this Discussion Paper and should be given serious
thought.

    MAPPING_FU2FU
    MAPPING_MU2MU
    MAPPING_FU2MU
    MAPPING_FU2XTL

The first CIF given below describes two (disordered) components,
TNT1 and TNT2, of the TNT molecule.  In the fictitious crystal
structure I have invented for the purposes of this illustration,
there is a mirror plane perpendicular to the benzene ring that
includes the methyl group and the N4 nitrite group.  The N2 and
N6 nitrite groups are disordered with the two components having
occupation numbers of 0.5.  I have also assumed that each
component has crystallographic symmetry 1, so that the two
components are related by the crystallographic mirror plane.
This is, of course, not the only possible chemical interpretation
allowed by the crystal structure - another possibility is that
the individual molecules might have m site symmetry but different
conformations, it all depends on how one chooses to correlate the
two disordered groups.

The second CIF describes NaCl as an example of a crystal with an
infinite bond graph.

The sample CIFs are intended only to show the organization of the
information.  Data names may change and dictionary definitions
will eventually be needed but it is simpler to discuss the CIF
structure in terms of concrete examples.

############# Beginning of first CIF #############
#
#
data_disordered_TNT
#
#  FORMULA_UNIT ITEMS
#
# The description of the components of the formula unit could
# include other properties such as the molecular mass, formal
# charge, chirality etc.
#
loop_
_formula_unit_id                # Unique component identifier
_formula_unit_name              # Optional
_formula_unit_formula
_formula_unit_point_group
_formula_unit_occupation_number
_formula_unit_details
1      TNT1    'C7 H5 N3 O6'   1   0.5
; first component composed of a benzene ring, methyl and three
nitro groups.
;
2      TNT2    'C7 H5 N3 O6'   1   0.5
; second component composed of a benzene ring, methyl and three
nitro groups related to the first by a crystallographic mirror
plane.
;
#
loop_
_formula_unit_atom_fu_id   # Child of _formula_unit_id
_formula_unit_atom_label #Parent of _formula_unit_bond_atom_label
_formula_unit_atom_element # Child of _atom_type_symbol
_formula_unit_atom_valence
_formula_unit_atom_coord_number
_formula_unit_atom_details
#
# The first two items together constitute the unique list
# reference.
#
# Note that the atoms in the two partial formula units can have
# the same labels because they are differentiated by their
# _*_fu_ids.
#
1   C1    C    4   3   ?
1   C2    C    4   3   ?
1   C3    C    4   3   ? 
1   C4    C    4   3   ?
1   C5    C    4   3   ?
1   C6    C    4   3   ?
1   C7    C    4   4   ?
1   H3    H    1   1   ?
1   H5    H    1   1   ?
1   H71   H    1   1   ?
1   H72   H    1   1   ?
1   H73   H    1   1   ?
1   N2    N    3   3   ?
1   O21   O    2   1   ?
1   O22   O    2   1   ?
1   N4    N    3   3   ?
1   O41   O    2   1   ?
1   O42   O    2   1   ?
1   N6    N    3   3   ?
1   O61   O    2   1   ?
1   O62   O    2   1   ?
2   C1    C    4   3   ?
2   C2    C    4   3   ?
2   C3    C    4   3   ? 
2   C4    C    4   3   ?
2   C5    C    4   3   ?
2   C6    C    4   3   ?
2   C7    C    4   4   ?
2   H3    H    1   1   ?
2   H5    H    1   1   ?
2   H71   H    1   1   ?
2   H72   H    1   1   ?
2   H73   H    1   1   ?
2   N2    N    3   3   ?
2   O21   O    2   1   ?
2   O22   O    2   1   ?
2   N4    N    3   3   ?
2   O41   O    2   1   ?
2   O42   O    2   1   ?
2   N6    N    3   3   ?
2   O61   O    2   1   ?
2   O62   O    2   1   ?
#
loop_
_formula_unit_bond_id            # List reference
_formula_unit_bond_fu_id         # Child of _formula_unit_id
_formula_unit_bond_atom_label_1  # Child of
                                 #   _formula_unit_atom_label
_formula_unit_bond_atom_label_2  # Ditto
_formula_unit_bond_type

1   1   C1   C2   delocalized
2   1   C2   C3   delocalized
3   1   C3   C4   delocalized
4   1   C4   C5   delocalized
5   1   C5   C6   delocalized
6   1   C6   C1   delocalized
7   1   C1   C7   single
8   1   C3   H3   single
9   1   C5   H5   single
10  1   C7   H71  single
11  1   C7   H71  single
12  1   C7   H71  single
13  1   C2   N2   single
14  1   N2   O21  double
15  1   N2   O22  double
16  1   C4   N4   single
17  1   N4   O41  double
18  1   N4   O42  double
19  1   C6   N6   single
20  1   N6   O61  double
21  1   N6   O62  double
22  2   C1   C2   delocalized
23  2   C2   C3   delocalized
24  2   C3   C4   delocalized
25  2   C4   C5   delocalized
26  2   C5   C6   delocalized
27  2   C6   C1   delocalized
28  2   C1   C7   single
29  2   C3   H3   single
30  2   C5   H5   single
31  2   C7   H71  single
32  2   C7   H71  single
33  2   C7   H71  single
34  2   C2   N2   single
35  2   N2   O21  double
36  2   N2   O22  double
37  2   C4   N4   single
38  2   N4   O41  double
39  2   N4   O42  double
40  2   C6   N6   single
41  2   N6   O61  double
42  2   N6   O62  double
#
# MOLECULAR_UNIT ITEMS
#
# The following loops may be omitted if there is no interest in
# defining molecular_units.  We may wish to define other
# properties of the molecular units besides those given below.
#
loop_
_molecular_unit_id          # Parent to _mu_id items, list ref.
_molecular_unit_name
_molecular_unit_formula
_molecular_unit_point_group
_molecular_unit_details
1  'benzene ring' 'C6 H2' 2mm ?
2  'methyl group' 'C H3'  3m  ?
3  'nitro group'  'N O2'  2mm ?
4  carbon          C       ?  'single atom'
5  hydrogen        H       ?  'single atom'
#
# The geometry specified for the molecular units is a notional
# ideal geometry, not the one observed in the crystal.  The
# geometry may be given using either atomic coordinates or bond
# lengths and angles. It may be used as a constraint (or
# restraint) in the refinement of the crystal structure, in which
# case there must be pointers in the atom_site loop to the
# appropriate molecular_unit.  See also the note at the very end
# of this Discussion Paper.
#
# The basis set for the coordinates is given in Angstroms
# but the orientation is arbitrary.  How do we convey information
# about the transformation between this basis and the crystal
# axes given that a molecular unit might map onto several groups
# in the crystal?
#
loop_
_molecular_unit_atom_mu_id    # Child of _molecular_unit_id
_molecular_unit_atom_label
               # Parent of _molecular_unit_bond_atom_label etc.
_molecular_unit_atom_element  # Child of _atom_type_symbol
_molecular_unit_atom_valence
_molecular_unit_atom_coord_number
_molecular_unit_atom_coord_x  
_molecular_unit_atom_coord_y
_molecular_unit_atom_coord_z
_molecular_unit_atom_details
#
# The first two items constitute the list reference.
#
# The benzene ring is defined by atomic coordinates, the other
# molecular_units are defined by their bonds and angles.
#
1   C1    C    4   3   0.037  0.146  -0.124 Benzene
1   C2    C    4   3   1.378    0.562   0.134 Benzene
1   C3    C    4   3   1.846  1.421   0.204 Benzene
1   C4    C    4   3   2.567  1.834   0.304 Benzene
1   C5    C    4   3   1.745  1.563   0.245 Benzene
1   C6    C    4   3   0.962  0 498   0.103 Benzene

2   C1    C    4   4   ?  ?  ? 'Methyl group'
2   H1    H    1   1   ?  ?  ? 'Methyl group'
2   H2    H    1   1   ?  ?  ? 'Methyl group'
2   H3    H    1   1   ?  ?  ? 'Methyl group'

3   N1    N    3   3   ?  ?  ? 'Nitrate group'
3   O1    O    2   1   ?  ?  ? 'Nitrate group'
3   O2    O    2   1   ?  ?  ? 'Nitrate group'
#
# Single atoms are assumed to be at the origin
#
4   H     H    1   1   0  0  0 'single atom'
5   C     C    4   4   0  0  0 'single atom'
#
loop_
_molecular_unit_bond_mu_id
_molecular_unit_bond_atom_label_1
_molecular_unit_bond_atom_label_2
_molecular_unit_bond_length
_molecular_unit_bond_order
_molecular_unit_bond_type
3   C1   C2   ?     1.5  delocalized
3   C2   C3   ?     1.5  delocalized
3   C3   C4   ?     1.5  delocalized
3   C4   C5   ?     1.5  delocalized
3   C5   C6   ?     1.5  delocalized
3   C6   C1   ?     1.5  delocalized
4   C1   H1   1.05  1.0  single
4   C1   H2   1.05  1.0  single
4   C1   H3   1.05  1.0  single
5   N1   O1   1.18    2.0  double
5   N1   O2   1.18    2.0  double
#
loop_
_molecular_unit_angle_mu_id
_molecular_unit_angle_atom_label_1
_molecular_unit_angle_atom_label_2
_molecular_unit_angle_atom_label_3
_molecular_unit_angle_angle
4  H1  C1  H2  109
4  H1  C1  H3  109
4  H2  C1  H3  109
5  O1  N1  O2  125
#
# THE NEXT SET OF LOOPS DESCRIBES THE MAPPINGS
#
# It is only necessary to give the minimum number of
# mappings needed to indicate the isomorphisms present in the
# various graphs.  Additional isomorphisms can be generated by
# combining two or more mappings.
#
loop_
_mapping_fu2fu_id_1
_mapping_fu2fu_atom_label_1
_mapping_fu2fu_id_2
_mapping_fu2fu_atom_label_2
#
# Mapping the two components onto each other.
#
1   C1   2   C1
1   C2   2   C2
1   C3   2   C3
1   C4   2   C4
1   C5   2   C5
1   C6   2   C6
1   H3   2   H3
1   H5   2   H5
1   C7   2   C7
1   H71  2   H71
1   H72  2   H72
1   H73  2   H73
1   N2   2   N2
1   O21  2   O21
1   O22  2   O22
1   N4   2   N4
1   O41  2   O41
1   O42  2   O42
1   N6   2   N6
1   O61  2   O61
1   O62  2   O62
#
loop_
_mapping_mu2mu_1
_mapping_mu2mu_atom_label_1
_mapping_mu2mu_2
_mapping_mu2mu_atom_label_2
#
# The one C and three H atoms are mapped onto the methyl group
#
3   C1   6   C
3   H1   5   H
3   H2   5   H
3   H3   5   H
#
loop_
_mapping_fu2mu_fu_id
_mapping_fu2mu_fu_atom_label
_mapping_fu2mu_mu_id
_mapping_fu2mu_mu_atom_label
#
# The molecular units are mapped onto one component of the
# formula unit.  Mapping to the other component follows from the
# isomorphism of the two components and is not explicitly shown.
#
1   C1   3   C1      # Benzene ring of component 1
1   C2   3   C2
1   C3   3   C3
1   C4   3   C4
1   C5   3   C5
1   C6   3   C6
1   H3   6   H
1   H5   6   H
1   C7   4   C1      # Methyl group of component 1
1   H71  4   H1    
1   H72  4   H2
1   H73  4   H3
1   N2   5   N1      # Three nitrite groups of component 1
1   O21  5   O1
1   O22  5   O2
1   N4   5   N1
1   O41  5   O1
1   O42  5   O2
1   N6   5   N1
1   O61  5   O1
1   O62  5   O2
#
# The crystallographic mirror operation perpendicular to the
# benzene ring is denoted by the _*_symop_id value of 2. 
# In this description it is assumed that each
# component (partial molecule) contains no mirror plane so the
# crystallographic mirror plane perpendicular to the benzene ring
# relates the two components.
#
# Note that the molecular_units are not explicitly mapped on to
# the crystal but their mapping can be deduced by combining other
# mappings that are given.
#
loop_
_mapping_fu2xtl_fu_component_id
_mapping_fu2xtl_fu_atom_label
_mapping_fu2xtl_crystal_atom_site_label
_mapping_fu2xtl_crystal_symop_id
#
#  The two components are mapped onto the atom_sites of the
# crystal.  Note that the disordered atoms in the crystal carry
# suffixes A and B.  C1, C7, H71 and N4 lie on the
# crystallographic mirror plane.
#
1   C1   C1   1
1   C2   C2   1
1   C3   C3   1
1   C4   C4   1
1   C5   C3   2
1   C6   C2   2
1   H3   H3   1
1   H5   H3   2
1   C7   C7   1
1   H71  H71  1
1   H72  H72  1
1   H73  H72  2
1   N2   N2A  1
1   O21  O21A 1
1   O22  O22A 1
1   N4   N4   1
1   O41  O41  1
1   O42  O41  2
1   N6   N2B  2
1   O61  O21B 2
1   O62  O22B 2
2   C1   C1   1
2   C2   C2   2
2   C3   C3   2
2   C4   C4   1
2   C5   C3   1
2   C6   C2   1
2   H3   H3   2
2   H5   H3   1
2   C7   C7   1
2   H71  H71  1
1   H72  H72  2
2   H73  H72  1
2   N2   N2B  2
2   O21  O21A 2
2   O22  O22A 2
2   N4   N4   1
2   O41  O41  2
2   O42  O41  1
2   N6   N2A  1
2   O61  O21B 1
2   O62  O22B 1
#
#
############ End of first CIF ################

######### Beginning of second CIF #############
#
#
# EXAMPLE OF A STRUCTURE WITH AN INFINITE BOND GRAPH
#
# NaCl is chosen to illustrate how infinite graphs are treated.
# The infinite graph of NaCl can be represented by the cubic
# lattice of the NaCl crystal with every Na atom forming a bond
# to its six Cl nearest neighbours and vice versa.  To reduce
# this graph to a finite graph, first extract one formula unit
# (the two atoms Na and Cl).  To extract these atoms, ten bonds
# must be broken, five bonds from Na and five bonds from Cl (one
# bond between Na and Cl remains intact.  The
# broken bonds occur in pairs (one from Na and one from
# Cl) related by the translational symmetry of the graph.  To
# complete the finite graph the two broken bonds of
# each pair are joined together, resulting in a finite graph
# which, for NaCl, consists of two atoms, Na and Cl, linked by
# six different bonds.  This is the graph described in the
# following sections.  The long-range order of the infinite graph
# is lost, but the short-range order, i.e., the nearest neighbour
# environment that represents the chemical interactions, is
# preserved.
#
data_nacl
#
loop_
_formula_unit_id         
_formula_unit_name
_formula_unit_formula
_formula_unit_details
#
# There is only one component in this graph.
#
nacl  'sodium chloride'  'Na Cl'   'Only one component'

loop_
_formula_unit_atom_id         # List reference
_formula_unit_atom_fu_id 
_formula_unit_atom_label 
_formula_unit_atom_element
_formula_unit_atom_valence
_formula_unit_atom_coord_number
_formula_unit_atom_details
1 nacl  Na1  Na    +1    6    ?
2 nacl  Cl1  Cl    -1    6    ?

loop_     
_formula_unit_bond_fu_id          # Child of _formula_unit_id
_formula_unit_bond_id             # List reference
_formula_unit_bond_fu_atom_label_1
_formula_unit_bond_fu_atom_label_2
_formula_unit_bond_fu_bond_valence # Equivalent of bond strength
nacl   1   Na1   Cl1   0.167  
nacl   2   Na1   Cl1   0.167
nacl   3   Na1   Cl1   0.167
nacl   4   Na1   Cl1   0.167
nacl   5   Na1   Cl1   0.167
nacl   6   Na1   Cl1   0.167
#
# In this example molecular_units are not used.
#
loop_   
_mapping_fu2xtl_fu_atom_id   
_mapping_fu2xtl_fu_atom_label
_mapping_fu2xtl_crystal_atom_site_label
_mapping_fu2xtl_crystal_symop_id  
#
nacl  Na1   Na   1  # Both of these atoms are chosen from the
nacl  Cl1   Cl   1  #    same asymmetric unit of the crystal
#
# Because there is, in general, more than one bond between the
# same two atoms in the finite graph, it is necessary to indicate
# the mapping of the bonds as well.  This requires that the six
# bonds linking the two atoms in the graph of the formula_unit
# be mapped onto different bonds in the crystal, requiring
# that the symmetry operation be given for at least one the two
# atoms defining the bond.  This example shows the mappings of
# the bonds of the formula_unit onto the bonds formed by the Na
# atom in the asymmetric unit.
#
loop_
_mapping_fu2xtl_bond_fu_id      # Child of _formula_unit_id
_mapping_fu2xtl_bond_fu_bond_id # Child of _formula_unit_bond_id
_mapping_fu2xtl_bond_fu_atom_label_1    # Redundant information
_mapping_fu2xtl_bond_fu_atom_label_2    # Redundant information
_mapping_fu2xtl_bond_crystal_atom_site_label_1
_mapping_fu2xtl_bond_crystal_symop_1
_mapping_fu2xtl_bond_crystal_trans_x_1
_mapping_fu2xtl_bond_crystal_trans_y_1
_mapping_fu2xtl_bond_crystal_trans_z_1
_mapping_fu2xtl_bond_crystal_atom_site_label_2
_mapping_fu2xtl_bond_crystal_symop_2
_mapping_fu2xtl_bond_crystal_trans_x_2
_mapping_fu2xtl_bond_crystal_trans_y_2
_mapping_fu2xtl_bond_crystal_trans_z_2
#
# The second item is the list reference which makes the listing
# of the labels 'Na1' and 'Cl1' redundant but visually helpful.
#
nacl  1  Na1  Cl1  Na  1 0 0 0 Cl  1 0 0 0
nacl  2  Na1  Cl1  Na  1 0 0 0 Cl  3 0 0 0
nacl  3  Na1  Cl1  Na  1 0 0 0 Cl  5 0 0 1
nacl  4  Na1  Cl1  Na  1 0 0 0 Cl  7 0 0 -1
nacl  5  Na1  Cl1  Na  1 0 0 0 Cl  9 0 0 0
nacl  6  Na1  Cl1  Na  1 0 0 0 Cl 11 0 1 0
#
############# End of second CIF ####################

COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF:

mmCIF has a chemical description which is similar to the one we
are proposing for coreCIF, though there are important
differences.  The rough equivalence is as follows:

   Our proposal                      mmCIF
  -----------                     --------------
   formula_unit                    struct_asym
   component                       entity
   molecular_unit                  chem_comp

The formula_unit and struct_asym both describe a number of
components (or entities) that together represent the total
contents of the crystal.  They differ in the following respects:

1. The formula_unit and the asymmetric unit are, in general, not
the same.  The formula unit normally contains several asymmetric
units if, e.g., the it contains crystallographic symmetry and
thus spans two or more asymmetric units.  The chemical formula of
the asymmetric unit may have non-integral multiplicities for the
elements if an atom lies on a special position.

2. struct_asym lists the entities (equivalent to the components)
which form the crystal, but mmCIF does not include their graphs.
Instead the atoms in the crystal that map onto the entity are
flagged with the entity_id in the atom_site category.

3. While the sum of all the components in coreCIF corresponds to
the complete formula_unit, in mmCIF identical molecules in the
asymmetric unit are represented by a single entity which is
repeated in struct_asym as many times as it appears in the
asymmetric unit.  In the proposal for the formula_unit above
every component, whether identical or not, is separately listed.
For example, if the TNT example above were described in mmCIF,
the two isomorphous components would be represented by a single
entity which would be listed twice in struct_asym.

PROBLEMS:
The link between the entity and the crystal is provided by
labelling each atom in the atom_site loop with a pointer to the
entity to which it belongs.  This means that each atom can belong
only to one entity, making it impossible to treat disorder in the
way described above.  Further since the atoms and bonds in the
entities are not separately enumerated, it may be more difficult
to assign chemical properties to them.

In mmCIF ENTITIES are classified as one of three types: polymer,
non-polymer and water.  Polymers are considered to be composed of
amino acids, nucleic acids or non-standard residues and are
defined in terms of the sequence of the monomers described in the
chem_comp loops. 

CHEM_COMP (= chemical component) is close to our definition of a
molecular_unit. It is designed to give the contents and
geometries of the individual monomeric units that compose the
macromolecules.  It describes the ideal geometry of the monomer
either in terms of Cartesian coordinates or in terms of bond
lengths and angles.  The individual atoms of the monomer are
(necessarily) labelled, normally with some standard chemical (as
opposed to a crystallographic) labelling scheme.  The atom_site
loop contains a pointer to the particular atom in the _chem_comp
that the atom in the crystal corresponds to.

CONCLUSION:
The mmCIF description of the chemistry is tailored to the needs
of macromolecular crystallography, particularly to the
descriptions of molecules that are composed of a sequence of
standard monomeric units.  It is not obvious that the categories
defined for this purpose could easily be adapted to small-cell
structures, particularly structures that are infinitely
connected.  We have three choices:
1. We can try to use as many of the definitions in mmCIF as
possible, defining additional terms as needed and possibly
extending the definitions given in mmCIF,
2. We can try to follow the philosophy of mmCIF in terms of the
way in which the chemistry is described, i.e., not giving the
graphs of the formula_unit and molecular_units and defining
isomorphous components in the formula_unit only once, or
3. We can devise our own scheme with minimal reference to mmCIF
(but, e.g., we might wish to ensure that the properties of
molecular_units can be mapped onto the properties given in
chem_comp).  In practical terms this might mean making greater
use of parent child relations instead of the mapping_fu2mu and
mapping_fu2xtl loops described above.  For example the
mapping_fu2xtl loop would not be needed if the formula_unit_atom
loop included pointers to _atom_site_label and
_space_group_symop. We might also consider making the component
of the formula_unit identical to the entity (while ignoring many
of the macroscopic entity properties that are irrelevant and
including instead the bond graph).

I would reject option #1 as inappropriate.  #2 would restrict our
ability to describe a chemical molecule independently of the
crystal structure. #3 is a possibility we should serious
consider.

PLEASE SEND YOUR COMMENTS TO CORECIFCHEM@IUCR.ORG



_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem

[Send comment to list secretary]
[Reply to list (subscribers only)]