Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CoreCIFchem Discussion #6

  • To: coreCIFchem@iucr.org
  • Subject: CoreCIFchem Discussion #6
  • From: David Brown <idbrown@mcmaster.ca>
  • Date: Mon, 04 Oct 2004 14:51:16 -0500
Dear Colleagues,

    I have attached a text file containing the latest discussion paper 
(#6) on the creation of a chemical description of a structure in CIF.  I 
have taken note of Howard's suggestion and produced what should be a 
good version for fine tuning.  There is only one outstanding problem - 
the description of the geometry for infinitely connected structures, but 
this can probably be sorted out without too much difficulty.  Otherwise 
the present draft presents a simpler and more flexible format than the 
earlier drafts.  The attached file runs to 35 pages for which I 
apologize, but there are unfortunately no shortcuts.

    I would like to work on the next version (or start preparing 
dictionary copy) after Dec 31, so I would like to have all your comments 
by then.  If this does not give you enough time, let me know.  The 
deadline can be changed to suit your timetables.

Best wishes

David

Prof. David Brown
Brockhouse Institute for Materials Research
McMaster University, Hamilton,
Ontario, Canada L8S 4M1
Fax 905 521 2773

               DISCUSSION PAPER #6

     For the sake of keeping the project moving I am setting DECEMBER 31,
2004 as the deadline for responses, but this deadline can be extended if you
need more time.  PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org

                    David

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

          CONTENTS OF THIS DISCUSSION PAPER #6
          -----------------------------------
     1. COMMENTS RECEIVED ON DISCUSSION PAPER #5  
     2. PREAMBLE TO THE REPORT TO THE CORE DICTIONARY MAINTENANCE GROUP
     3. SAMPLE CIFS
          3.1 CIF FOR TNT
          3.2 CIF FOR CaCrF5
     4. COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF
     5. SAMPLE CIFS WITH COMMENTS REMOVED

PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org before DECEMBER 31, 2004
                                                  
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     
          COMMENTS RECEIVED ON DISCUSSION PAPER #5

Each of these comments is followed by my response.  Other more specialized
comments are placed within the sample CIFs.

Comments by Howard D. Flack 
-----------------------
  I've only really considered David's comments and the TNT example. I did not
work in detail at the CaCrF5 example. I've made a new skeleton CIF of the TNT
example which to me is simpler and easier to read. I guess it breaks many CIF
rules of syntax.

IDB Response
------------
I have used Howard's skeleton as the basis for the present proposal which is
more CIF conformant than Howard's draft and, I hope, simpler.  I have reworked
the CaCrF5 CIF using the same scheme. 

HDF on tecton v molecular unit
--------------------------------------
  I've used the word 'tecton' to mean a general building block instead of
molecular_unit. I heard it used in a talk by Guy Orpen but Guy has written to
me to say he did not invent. He has sent me a few references which I have not
yet had time to read.

IDB reponse
-----------
I have adopted this terminology to refer to a collection of bonded atoms whose
topology we describe.  These may not conform the concept of a tecton used
elsewhere, so I hope this does not cause confusion.

HDF on unique atom identifiers
------------------------------
  One of the aspects of David's implementation as seen in the TNT example
which troubles me, is the necessity for each atom in each molecule to have an
unique identifier as coded in '_molecular_unit_atom_mu_id'. As far as I
remember there are already several million molecules that are known and giving
every atom in each molecule a unique identifier is cumbersome to say the
least. I definitely think that we will see molecular libraries come into
existence either locally or globally. New molecules can be added to a library
by editing, cutting and pasting in bits from other molecules so again atom
identifers containing the molecular name are very heavy in use. One perverse
aspect of using unique atom identifiers over a set of molecules is that it
does not per se ensure molecular integrity. In defining the bond topology it
is quite possible to do the stupid thing of defining a bond between two atoms
which are not in the same unit.

  May be this is yet another thing that I have not really understood about the
CIF syntax. David is suggesting the construction of a unique reference item by
the concatenation of two others. Why not just use the initial pair of
reference items together as a unique pointer in its own right. This is what I
have done in my 'improved' CIF.

IDB response
------------
The atom name only needs to be unique within the CIF so most of Howard's
concerns are not a problem.  However, we should distinguish between the atom
name that is used daily by the chemist and the 'list-reference' required by
CIF.  Every list (i.e., loop) in a CIF must have a list-reference which is a
string that serves as a unique address for a particular line in the list.  In
the _atom_site list, for example, the list reference is the _atom_site_label
which also serves as the common name of the atom used by the crystallographer,
e.g., C1, N21.  There are two views that one can take.  The first says that
since the crystallographer is not going to assign the same label to two
different atoms, the loop can be kept simple by also using the chemical name
(_*_atom_site_label) as the CIF list-reference.  The second view argues that
it is best to keep the list-reference, which is required for CIF file
management, separate from items that convey chemical or crystallographic
information.  The underlying philosophy of CIF is to separate the syntax
(grammar) of the file from the semantics (the information it contains) so that
a program can manipulate the file without knowing anything about
crystallography.  This philosophy favours the second view.

The implications of these two views are brought out clearly in the examples we
are discussing.  If there is more than one tecton in the list of atoms that
define the topologies, the same chemical name may be used more than once.  For
example, in the TNT case C1 appears in the description of the topology of the
TNT molecule as well as in the description of the topology of the benzene
ring.  In order to provide a unique address for each line, the list-reference
must distinguish between these, i.e., it must also include some identification
of the tecton.  This means that two items are required to give a unique
address which leads to unnecessary complications in programming.  In effect
the syntax (the number of items required for the list reference) is dependent
on the type of information in the loop.  The extreme example of this approach
is seen in the torsion (dihedral) angles loop where the list-reference would
have to involve five separate items, the tecton_id and the labels of the four
atoms that define the angle!  This requires that programs that make use of the
relational structure of CIF must be able to handle list-references that
consist of many items.  If, on the other hand, we assign unique list-
references that carry no chemical information, such as assigning sequential
numbers to the different lines (as I did in the previous discussion paper),
each dihedral angle is identified by a number, and the four atoms that define
the dihedral angle are each identified by the number that represents their
address (their id) in the atom list.  While this makes programming easier
since each list-reference consists of a single item, it makes the CIF more
difficult to read since the reader must check back to the atoms list to find
out which atom wears the number 35 (say).

The list reference does not have to be a number - it can be any string
provided that it is unique within the list, so one way to reconcile these two
approaches is to construct a composite list-reference from chemically
meaningful terms, e.g., for the atom list one might choose A.C1, A.C2, B.C1,
etc. where A and B distinguish the different tectons.  These tecton_atom_ids
would be parents to items that appear in other lists where they are used to
identify the atoms that form bonds, angles etc.  CIF treats the list-reference
strings as unparsable and assumes they contain no chemical information; the
author of a CIF can use any desired string.  However, to help us discuss these
sample CIFs I have used composite list-references in the places where they
will make our life easier.  Elsewhere I have used sequential numbers or
letters.  

In summary, the list-reference is always a string that has no semantic content
but is used solely for file management (e.g., locating particular lines in a
list).  Such strings are never parsed by the computer.  The chemical
information always resides in other items on the line. 

HDF proposes that all bonds and angles be defined in terms of atoms
-------------------------------------------------------------------
In defining the geometry of a tecton David's uses two atoms to define a
geometric bond, two topological bonds to define an angle and worries what
should be the correct way of doing a dihedral angle either by way of atoms or
bonds. I maintain that the only correct way to define the geometry is in all
three cases to use a set of atoms: 2 for distances, 3 for angles and 4 for
dihedral angles. The reason is as follows: the geometry section allows
interatomic distances to be specified but nothing requires that the two atoms
concerned form a bond as defined by the topology; similarly the three atoms
used to specify an angle may or may not be forming bonds as specified by the
topology; etc. One fairly frequently specifies angles by specifying
interatomic distances between atoms which are not bonded as defined by the
topology.

IDB response
------------
I agree.  This simplifies the CIF since (almost) all bonds, distances, angles
and mappings use the atom_ids that are defined in the tecton_topology_atom
loop. 
 
HDF on avoiding unnecessary repetition
--------------------------------------
  Another aspect which seemed rather heavy in David's TNT example was the
repetition of certain information. As I see it there are four conformers (aa,
bb, ab, ba in David's nomenclature) all of which correspond to the same
molecule, meaning to the same molecular topology. To improve this state of
affairs it seems natural to input a unit defining only its topological
features - the TNT molecule - and then reuse this unit several times adding in
either the minimum or complete geometric information necessary to distinguish
the 4 conformers. So I ended up with four units of identical topology but
differing geometry and one unit defining only the topology. This makes the
relationship between the conformers and the parent molecule easy to perceive
in the file. I felt that it was essential to be able to define all four
conformers. Although I well understand that often one can not make an
unequivocal assignment of molecules or conformers in the case of a disordered
crystal structure, it is certainly also the case at present that models of
disorder in molecular crystal-structure determinations are being used which
have no possible interpretation in terms of the (assumed) constituent
molecules. 

IDB response
------------
This problem has been addressed in the present draft.  My solution adds an
extra layer but it is simpler than Howard's.  The topology and geometry are
kept separate and the tecton is defined at the topology level while the
conformers are only introduced at the geometry level.  Each of the distances
and angles are given only once and each is flagged so that it can be
identified with one or more of the conformers.  In this way all four
conformers are defined without the need to repeat any of the distances or
angles. 

Howard mentions crystallographic models of disorder that have no atomic
description.  A method is currently being developed by the CIF Core Dictionary
Maintenance Group whereby the number of electrons in a diffuse patch of
electron density will be indicated by including dummy atoms in the _atom_site
loop, representing atoms presumed to be present in the crystal.  A direct
mapping from the topology to the real and dummy atoms in the atom_site loop
should be possible even in these cases even though the positions of the dummy
atoms are not defined.

HDF comment on compatibility with INChI
---------------------------------------
I would like to have some reassurance that the molecular data structures we
are trying to define in CIF are as compatible as possible with those used in
the IUPAC project for producing unique chemical identifiers (I've forgotten
its name yet again).

IDB response
------------
The name is now INChI, the IUPAC-NIST CHemical Identifier.  I don't think
there is a problem at the topology level, but I don't know if there are
problems in dealing with conformers.  I will have to look into this.

HDF comment on dangling bonds
-----------------------------
In the chemical sub-groups like the 1,2,4,6 benzene and nitro groups, it makes
sense to me to include the dangling bonds. I'm also very much in favour of
including ALL the atoms especially the hydrogen atoms.

IDB response
------------
Fine.  I have retained this feature of Howard's proposal, but see the note
below about the difficulties of mapping dangling bonds.

HDF comment on mapping
----------------------
  I was disturbed by David's use of the word 'map'. In mathematics it has a
very precise meaning [If you map set A on to set B then you have to assign one
single element of B to every element in A. This means every element in A has
to have a unique son in B although several different elements in A can lead to
the same element in B. Also whereas every element in A must have a son in B,
not every element in B has to be the son of an element in A.] Especially in
the relation between the tectons and the crystal structure, these criteria
were not being obeyed.

IDB response
------------
While it would be good to adhere to the mathematical practice, I wonder how
well it would be observed by crystallographers most of whom are unaware of the
mathematical rules, particularly as the kind of 'mapping' we do in this file
does not, in general, follow the mathematical rules.  Perhaps we should use a
different word, but I have not been able to think of a good substitute.  I
have continued to use the word 'map' in the present draft in the absence of a
suitable alternative even though the mathematical rules of mapping are not
followed.


HDF on Disorder in molecular crystals:
--------------------------------------
  As Greg makes clear, it may well be that there are several interpretations
(mappings) which relate the topological definition of a molecule to the atomic
coordinates determined from crystal-structure analysis. We must be sure that
we provide a mechanism to encode these alternate molecular interpretations and
associated geometry. It seems to me that the needs for journal publishing
(i.e. checking) disordered structures and the way that they are subsequently
entered into a database are somewhat different. For the publishing/checking
side of the business one needs to provide a mechanism to evaluate the
structural sense of the molecules including the disordered part of the
structure.  I've seen too many papers where the disordered part of the
structure makes absolutely no molecular sense at all. (One of my colleagues in
inorganic chemistry recently received a paper to referee in which about 50% of
the electron density was modelled through Ton Spek's BYPASS procedure with no
attempt at any molecular interpretation of the disordered region. To our minds
in that case the structure analysis could have been improved so we recommended
reanalysis.) On the other hand I think that for the data bases, tentative
interpretations of disordered regions have much less use and probably what is
required is that although the topology of the complete molecule be defined,
the mapping (atoms and bonds) between the topology and the crystal structure
relate only to those parts of the molecule which are well ordered in the
crystal structure. 

> since it does not require the author to specify how the disordered atoms 
> sites are combined in the individual molecules,

  I'm very suspicious of that. One must provide a mechanism that allows
multiple mappings of the topological definition of the molecule onto the atoms
seen in the crystal-structure analysis. Of course it's not for coreCIFchem to
'require' such information but I certainly see that it could be put to very
good use for the purposes of checking a crystal-structure analysis.

IDB response
------------
The present proposal has a remarkable flexibility.  The topology can be mapped
to the crystal structure with or without reference to the conformation, but if
the conformers are specified, they can be combined in any desired way that
matches the known or supposed molecular structure of the crystal.  Similarly
different isomers may be mapped to the crystal in any combination.  (N.B.
isomers differ at the topological level, conformers have the same topology but
differ at the geometry level).

HDF on the need for all the atoms in the bond graph to be in the crystal
------------------------------------------------------------------------
> It is not necessary that the molecular units (tectons) account for all
> the atoms found in the crystal structure, nor that the crystal structure
> contain all the atoms specified in the molecular units.

 I have no trouble with the first part of the sentence but the second part
after 'nor' leaves me somewhat perplexed. I expected that all of the atoms
specified in the molecular units would be in the crystal structure even if one
could not see them clearly. Could you give examples of what you have in mind
here.

IDB response
------------
The inherent structure of the CIF does not require that every atom in the
crystal map onto an atom in the tecton and vice versa.  To require such a
restriction seems unnecessary and difficult to enforce in any automatic way. 
Someone may wish to define a tecton for which no crystal structure has been
reported, or for which only the unit cell is known.  They may wish to define a
monomer and a dimer as two tectons, where only the monomer appears in the
crystal.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

2. PREAMBLE TO THE REPORT TO THE CORE DICTIONARY MAINTENANCE GROUP

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The crystallographic information given in a CIF consists of the atomic
coordinates in the asymmetric unit, the symmetry operations needed to generate
all the atoms in the crystal, the lattice parameters and some interatomic
distances.  The present proposal provides a means of adding a chemical
description of the structure in the form of a bond graph (2-D structure) that
identifies how the atoms are arranged in molecules and complexes (tectons),
and which interatomic distances correspond to chemical bonds.  It also allows
the ideal molecular geometry to be specified for comparison with the observed
geometry in the crystal.  The atoms in the bond graph are identified with the
corresponding atoms reported in the crystal so that a database search on the
topology of the molecule graph can retrieve its crystallographic structure, or
alternatively, the distance between two atoms in the crystal can be identified
with a chemical bond in a particular molecule. 

With this addition, the CIF may include a description of the bonding topology
of one or more tectons, a tecton being defined as a group of atoms linked by
bonds, usually representing a molecule, complex or functional group.  There is
no limit to the number of tectons that may be described in a given CIF and
there is provision for mapping the atoms of one tecton onto the atoms of
another, as well as identifying the atoms in a tecton with the atoms in the
crystal. 

A chemical description of the contents of a crystal, distinct from the
crystallographic description, serves a variety of uses.  It describes the
contents of the crystal in the language of chemistry rather than the language
of crystallography.  This allows the structure determinations of crystals
containing particular molecules (tectons) to be located by searching on their
bond topology, permitting the coordinates of the atoms that form the tecton to
be retrieved.  Further the crystal structure determination does not itself
identify which atoms are bonded.  A topological description of the bonding
network supplements the information given in the crystal structure report by
identifying which interatomic distances correspond to chemical bonds.  The
chemical description can be used to identify the different molecules that
compose a crystal, or the crystallographically distinct copies of the same
molecule.  Finally the proposed chemical description allows the ideal geometry
and conformation of a the tectons to be specified - information which can be
used during the refinement of a crystal structure (e.g., in defining rigid
groups) or for validating the experimental bond distances and angles.  

The proposed chemical description of a tecton is given in the CIF in a number
of groups of related categories or loops.  The first group identifies the
different tectons described and provides a description of their bond
topologies, i.e., the list of atoms and the list of bonds that link these
atoms.  The second group gives the ideal geometries of the tectons and
identifies the different possible conformers.  The final group allows the
tectons to be mapped onto each other and identifies the atoms and bonds in the
tectons with the atoms and interatomic distances in the crystal.

The topologies of the tectons are described in the tecton_topology categories
in the form of a list of atoms and the bonds between them.

  TECTON_TOPOLOGY         Lists the different tectons described
  TECTON_TOPOLOGY_ATOM    Lists the atoms in each tecton
  TECTON_TOPOLOGY_BOND    Lists the bonds between the atoms

The topological description does not include any information on the geometry
of the tecton but it does distinguish between isomers. 

The decision as to what constitutes a tecton is left to the author but a
tecton would normally correspond to a molecule or, in the case of the
infinitely bonded solids typically found in inorganic compounds, the tecton
would normally be chosen as the formula unit, the smallest group of atoms that
contains all the chemical elements in the same proportions as they are found
in the crystal. 

The conformation and geometry of the tectons are given in the tecton_conformer
and tecton_geom categories, the former identifying the different conformers
that may be present, the latter defining their geometry.

  TECTON_CONFORMER         Lists different conformers and their properties
  TECTON_CONFORMER_EQUIV   Defines the geometry labels of conformer
  TECTON_GEOM_ATOM         Gives coordinates of ideal geometry
  TECTON_GEOM_DIST         Gives ideal interatomic distances
  TECTON_GEOM_ANGLE        Gives ideal bond angles
  TECTON_GEOM_TORSION Gives ideal torsion angles

The geometry may be given either by specifying atomic coordinates or by
supplying bond lengths, angles and torsion angles.  

The tectons are mapped onto each other in the category:

      MAP_TECTON_ATOM      Maps atoms of one tecton onto those of another

and the atoms of the tecton are identified with atoms in the crystal in the
categories:

  MAP_TECTON2CRYSTAL_ATOM  Identifies tecton with crystal atom
  MAP_TECTON2CRYSTAL_BOND  Identifies tecton bond with crystal distance

The way in which CIF describes the tectons and their mappings are illustrated
by two sample CIFs, one of a disordered organic molecule, the other of an
infinitely connected inorganic solid.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                           3. SAMPLE CIFS
                        
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

At this stage in the development of the tecton description we are not
attempting to write dictionary definitions, but only create two sample CIFs to
ensure that the file structure is organized in a way that can handle both
organic and inorganic crystals in the simplest possible way.

The first CIF describes the structure of the molecule trinitrotoluene, TNT. 
It shows how a molecule with a finite bond graph is handled when the molecule
lies on a Wyckoff special position and two of the nitro groups are disordered. 
By way of illustration, tectons corresponding to several subunits of the
molecule are also defined and are mapped onto the molecule itself.  

The second CIF describes the structure of CaCrF5 which has an infinite bond
graph and a formula unit that spans more than one asymmetric unit.

[Editorial comment: Data names may be changed in the final report and
dictionary definitions will eventually be needed.  Suggestions for better
names are welcome.  Items marked as 'list-reference' are required for the
management of the CIF's relational file structure and must be unique for each
line in a list.  The list-reference item in one loop is frequently parent to
similarly named items in other loops.  There is at least one serious
unresolved problem (in the geom categories of the second CIF).  Its solution
is deferred to the next draft.  The following sample CIFs contain extensive
comments to explain how the CIF is to be interpreted.  The same CIFs with the
comments stripped out appear at the end of this file so that one can see more
clearly what they look like.]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                        3.1 FIRST SAMPLE CIF
                        --------------------

                       TRINITROTOLUENE

                         O    CH3  0
                         |    |    |
                   O --- N2   C1   N6 --- O
                          \  /  \ /
                            C2  C6
                            |    |
                       H -- C3  C5 -- H
                             \   /
                              C4
                               |
                               N4
                              / \
                             O   O

In the fictitious crystal structure I have invented for the purposes of this
illustration, the molecule contains a crystallographic mirror plane that
passes through the methyl group and the N4 nitro group and is perpendicular to
the plane of the molecule.  The N2 and N6 nitro groups are related by the
mirror plane and are disordered with the two components each having occupation
numbers of 0.5.  Because of the disorder the crystallographic structure does
not define the point group of an individual molecule.  By choosing one
combination of the disordered nitro groups the molecule would have Cs
symmetry, but by choosing a different combination the individual molecules
would have C1 symmetry.  Either or both combination may of course be present
in the real crystal but x-ray diffraction cannot distinguish between them. 

############# Beginning of first CIF #############
#
#
data_disordered_TNT
#
# The first set of loops define the topology of the TNT molecule
# (tecton 1) and two subunits of the molecule (tectons 2 and
# 3).  The subunit definitions likely would not often be used but are
# included here to show that it can be done.
#
# If a crystal contained molecules of more than one compound, or more than one
# isomer of a compound, each would be described by a separate tecton. 
# If the crystal contained more than one copy of the same molecule in the
# asymmetric unit (Z'>1) the topology of the tecton would be given only once
# but it would be mapped onto all the crystallographically distinct copies.  
#
# According to CIF practice, all the items in a loop belong to the same
# 'category'.  The category name forms the first part of the datanames of all
# items in the loop.  
#
# The list-reference items in each loop are unique to the line in which they
# appear and constitute an address that a program can use to establish
# relationships between the items described in the different loops.  Each loop
# must have a list-reference.
#
# The part of the CIF describing the crystallographic structure is omitted in
# this first example, but its contents should be self-evident in the map
# loops.
#
############################################################
#                  DEFINING THE TECTONS
#
# The first loop lists the different tectons being defined
# together their properties.  We may wish to define other properties, such as
# net charge and formal charge carried by the tecton.
#
# I have made some fairly drastic changes to Howard's proposal here.  Howard
# listed both the tectons and their conformations in the same loop and
# therefore included both topological and geometric information in the same
# loop as follows:
#
# loop_
# _tecton_id          # List reference / short name
# _tecton_name        # name e.g. full IUPAC name  
# _tecton_formula
# _tecton_Zprime
# _tecton_geometric_class          # See below 
# _tecton_graph_automorphism_group # See below
# _tecton_chirality   # time-averaged if no geometry given
# _tecton_type
# TNTaa 'aa 2,4,6 trinitrotoluene'  'C7 H5 N3 O6' ?  Cs ?  achiral  conformer
# TNTbb 'ab 2,4,6 trinitrotoluene'  'C7 H5 N3 O6' ?  Cs ?  achiral  conformer
# TNTab 'ab 2,4,6 trinitrotoluene'  'C7 H5 N3 O6' ?  C1 ?  *TNTba   conformer
# TNTba 'ba 2,4,6 trinitrotoluene'  'C7 H5 N3 O6' ?  C1 ?  *TNTab   conformer
# TNT   '2,4,6 trinitrotoluene'     'C7 H5 N3 O6' 1  C2h ? achiral  molecule
# subnz '1,2,4,6 benzene ring'      'C6 H2'       1  C2h ? achiral  moiety
# nitro 'nitro group'               'N O2'        3  C2h ? achiral  group
#
# Need to find out about IUPAC rules for naming conformers
# *TNTba means chiral being the enantiomer of TNTba
#
# I have separated the isomer (tecton) from the conformer and treated each in
# separate categories.
#
# HDF comments
# ------------
# _tecton_graph_automorphism_group encodes the symmetry of the graph as a
# group of permutations (of atoms). If I understand correctly there are no
# standard symbols for these automorphism groups although it seems that in a
# fair number of cases they are isomorphic to a point group in three
# dimensions. So often one could use a Schoenflies symbol. [Even that is
# equivocal - point groups Cs, Ci and C2 are isomorphic] The TNT graph should
# be given the symbol C2v for its _tecton_graph_automorphism_group. [C2v is
# isomorphic to D2 and C2h. Why is it I prefer C2v? Am I (are we) too geometry
# oriented? 
# I think we (David?) need to call John Rutherford again.] What one does in
# the case that the graph automorphism group is not isomorphic to a 3D point
# group, I do not have the least idea apart from writing a ? .
#
# IDB response
# ------------
# Including this item which expresses the symmetry of the graph of the tecton
# (but not its geometry) is an interesting idea, but in the absence of a
# recognized system of symbols it is difficult to know what should be given
# here.  For this reason I have not included this in the draft below.
#
# HDF comment
# -----------
# >_molecular_unit_details
#  Note especially that I think that it should be possible to retrieve the
# 'molecular' information (topology and geometry) from a data bank / data
# base. Each 'molecule' should stand in its own right. So the sort of comment
# that David has in his _details "This is the whole molecule, A portion of the
# TNT molecule, A group that appears three times in the TNT molecule" should
# not be included in the above loop as this renders the information dependent
# on a particular instance. 
#
# IDB response
# ------------
# Howard has not included a _*_details item in his draft though I would argue
# that it is useful for someone looking at the CIF.  These 'details' cannot be
# computer-interpreted during retrieval from a databank so they clearly do not
# influence the way a computer would search and retrieve information.
#
# The CIF dictionary already contains instructions for drawing a 2-D molecular
# diagram in the group of chemical_conn categories.  Although the
# chemical_conn categories also describe the topology of a molecule they are
# not a substitute for the tecton categories because 1) they are restricted to
# organic molecules, 2) they are designed only to display a molecular diagram,
# 3) only one molecule can be described and 4) the atoms are not mapped onto
# the atom_sites in the crystal.  
# It would, however, be possible to include an item
# _tecton_atom_conn_atom_number as a child of
# _chemical_conn_atom_number in the following loop to allow the tecton
# to be mapped to chemical_conn and hence plotted as a 2-D diagram.
# Alternatively we could supply 2-D coordinates directly for our atoms in the
# tecton_topology_atom loop.  This would be preferable because it avoids the
# limitations of the chemical_conn categories listed above.
#
# I have replaced _tecton_type with _tecton_special_details.  It is not clear
# if _tecton_type is just a free text field for the benefit of human readers
# (in which case a _*_details field works as well) or whether there would
# be an enumeration list, which would require us to define what is a molecule,
# what is a group and what is a moiety.  I would rather not go there!
#
loop_
_tecton_topology_id           # List-reference 
_tecton_topology_name         # Name e.g. full IUPAC name  
_tecton_topology_formula      # Numbers of atoms in the tecton
_tecton_topology_Zprime       # Number of symmetry independent copies of the
                                    # tecton in the crystal
_tecton_topology_special_details
TNT   '2,4,6 trinitrotoluene'     'C7 H5 N3 O6' 1  molecule
BNZ   '1,2,4,6 benzene ring'      'C6 H2'       1  moiety
NITRO 'nitro group'               'N O2'        2  group
#
# The tecton_topology_atom loop that follows next defines the atoms in the
# tecton and their chemical properties.  
#
# A _tecton_topology_atom_id has been added as the list-reference and this
# item is the parent to many items found in subsequent loops.  For
# convenience in this discussion I have constructed it out of the first
# letter of _tecton_topology_id followed by _tecton_topology_atom_label.  This
# ensures uniqueness in the list while making it clear which atom is referred
# to. 
#
# I have removed the valence as something that needs more thought, if indeed
# it is needed at all.
#
# I have added an item _tecton_topology_atom_chirality which is not needed in
# this example, but is needed in chiral structures to identify any atom that
# serves as a chiral center.  Chirality is not captured by the topology, but
# it is, like topology, a feature of the structure that can only be changed by
# breaking and making bonds.  It is included here because it is more closely
# related to the topology than to the geometry which can be changed without
# breaking any bonds.  I will defer to others what values should be associated
# with this item - presumably some letter like R or S.
#
loop_
_tecton_topology_atom_id            # List reference, parent of many items
_tecton_topology_atom_tecton_id     # Child of _tecton_topology_id
_tecton_topology_atom_label         # For human use only
_tecton_topology_atom_type_symbol   # Child of _atom_type_symbol
_tecton_topology_atom_coord_number  # Number of bonds formed by this atom
_tecton_topology_atom_chirality
_tecton_topology_atom_details
T.C1  TNT     C1   C     3  . ?
T.C2  TNT     C2   C     3  . ?
T.C3  TNT     C3   C     3  . ?
T.C4  TNT     C4   C     3  . ?
T.C5  TNT     C5   C     3  . ?
T.C6  TNT     C6   C     3  . ?
T.C7  TNT     C7   C     4  . ?
T.H3  TNT     H3   H     1  . ?
T.H5  TNT     H5   H     1  . ?
T.H71 TNT     H71  H     1  . ?
T.H72 TNT     H72  H     1  . ?
T.H73 TNT     H73  H     1  . ?
T.N2  TNT     N2   N     3  . ?
T.O21 TNT     O21  O     1  . ?
T.O22 TNT     O22  O     1  . ?
T.N4  TNT     N4   N     3  . ?
T.O41 TNT     O41  O     1  . ?
T.O42 TNT     O42  O     1  . ?
T.N6  TNT     N6   N     3  . ?
T.O61 TNT     O61  O     1  . ?
T.O62 TNT     O62  O     1  . ?
B.C1  BNZ     C1   C     3  . 'benzene ring'
B.C2  BNZ     C2   C     3  . 'benzene ring'
B.C3  BNZ     C3   C     3  . 'benzene ring'
B.C4  BNZ     C4   C     3  . 'benzene ring'
B.C5  BNZ     C5   C     3  . 'benzene ring'
B.C6  BNZ     C6   C     3  . 'benzene ring'
B.H3  BNZ     H3   C     1  . 'benzene ring'
B.H5  BNZ     H5   H     1  . 'benzene ring'
N.N1  NITRO   N1   N     3  . 'nitro group'
N.O1  NITRO   O1   O     1  . 'nitro group'
N.O2  NITRO   O2   O     1  . 'nitro group'

# The next loop defines the bonds in each of the tectons, again giving
# just the topological properties of the bonds, not their geometries. 
#
# In the TECTON_BOND category the atoms are identified by two children of
# _tecton_topology_atom_id.
#
# Howard has added dangling bonds to show the full coordination around all the
# atoms in the tecton as well as indicating the points at which the tecton is
# attached to other species.  The dummy atoms are indicated by the default '.'
# meaning that this atom cannot be defined.  It would make sense to include
# the dangling bonds when, for example, the benzene ring is mapped onto TNT in
# the map_tecton loop.  However, this requires that the atom at the far end of
# the dangling bond be given a name, which in turn means that the name must be
# added to the tecton_topology_atom list in order to preserve the parent-child
# relations.  It would be necessary to identify such atoms as dummies, which
# could be done by assigning them a non-existant atom_type such as X though
# this would in turn have to be defined in the atom_type loop.  It all seems a
# little convoluted. A simpler scheme may be possible.
# To avoid these problems the dangling bonds are not mapped and the CIF is
# fully compliant.
#
# Changes from Howard's proposal:
# ------------------------------
# 1. A separate list-reference item (_tecton_topology_bond_id) has been added
# to avoid having to combine three items to define the list reference.  Since
# this item is not currently the parent to any further item in this CIF, a
# simple number works well (but see the different situation in Sample CIF #2). 
#
# 2.The atom_labels have been replaced by _tecton_topology_bond_atom_ids that
# provide the computer-readable link to the tecton_topology_atom list.
#
# 3. The direct link to the tecton_topology_id has been removed as this
# information can be recovered by referring to the tecton_toplogy_atom list.
#
# 4. For the bond type I have adopted the conventions used in chem_conn_bond
# which are those suggested by CCDC. 
#
loop_
_tecton_topology_bond_id
_tecton_topology_bond_atom1_id     # Child of _tecton_topology_atom_id
_tecton_topology_bond_atom2_id     # Child of _tecton_topology_atom_id
_tecton_topology_bond_type
1     T.C1   T.C2   arom      # TNT benzene ring
2     T.C2   T.C3   arom
3     T.C3   T.C4   arom
4     T.C4   T.C5   arom
5     T.C5   T.C6   arom
6     T.C6   T.C1   arom
7     T.C3   T.H3   sing
8     T.C5   T.H5   sing
9     T.C7   T.C1   sing          # TNT Methyl group
10    T.C7   T.H71  sing
11    T.C7   T.H72  sing
12    T.C7   T.H73  sing
13    T.N2   T.C2   sing          # TNT N2 nitro group
14    T.N2   T.O21  delo
15    T.N2   T.O22  delo
16    T.N4   T.C4   sing          # TNT N4 nitro group
17    T.N4   T.O41  delo
18    T.N4   T.O42  delo
19    T.N6   T.C6   sing          # TNT N6 nitro group
20    T.N6   T.O61  delo
21    T.N6   T.O62  delo
22    B.C1   B.C2   arom     # 1,2,4,6 substituted benzene ring
23    B.C2   B.C3   arom
24    B.C3   B.C4   arom
25    B.C4   B.C5   arom
26    B.C5   B.C6   arom
27    B.C6   B.C1   arom
28    B.C3   B.H3   sing
29    B.C5   B.H5   sing
30    B.C1   .      sing          # A dangling bond
31    B.C2   .      sing
32    B.C4   .      sing
33    B.C6   .      sing
34    N.N1   N.O1   delo     # Nitro group
35    N.N1   N.O2   delo
36    N.N1   .      sing
#
# HDF on 'delocalized'
# -------------------
#   Between C1 and C2:
#    (sigma) there is a sigma bond due to the overlap of a lobe of an sp2
# hybrid on C1 with a lobe of an sp2 hybrid on C2 with consequent sharing of
# electrons. That part of the 'bond' is not delocalized.
#    (pi) participation in a localized pi bond due to overlap of the pz
# orbitals and consequent sharing of electrons.
#   I don't think the C1-C2 interaction should be described as 'delocalized'.
# Only a part of the bond could be so described.
#
# IDB response
# ------------
# I have adopted the convention used by CCDC.
#
###########################################################
#       DEFINING THE TECTON CONFORMERS AND GEOMETRY
#
# The disordered nitro groups can be combined in four different ways, a-a and
# b-b (both with Cs symmetry), and a-b and b-a (both with C1 symmetry, one
# being the enantiomer of the other).  
#
# These different combinations give rise to different conformers which have
# the same topology but different geometries. 
# The definition of the different conformers is thus related to the
# description of geometry, rather than topology. 
# The topology in this case is indicated by the name TNT, the conformers by
# names such as TNTab.
#
# The following loop appears in Howard's draft as a way of allowing the
# geometry common to all conformers to appear only once.  In his draft TNTaa
# etc. as well as TNT were defined as tectons in a previous loop (see above). 
#
# loop_
# _tecton_topology_combine_id              # Child of _tecton_id
# _tecton_topology_combine_source_id       # Child of _tecton_id
# TNTaa   TNT  # Means any information about TNT also applies, as such, to     
#              #TNTaa
# TNTbb   TNT
# TNTab   TNT
# TNTba   TNT
# MAY BE THIS CAN BE DONE LEGALLY WITHIN CURRENT CIF SYNTAX BY SAVE FRAMES ???
# [Save frames are used in dictionaries but are not yet part of CIF - IDB] 
#
# In the draft presented here the combining of the geometries is achieved in a
# different way which, I believe, is more appropriate for CIF, is more
# flexible and would make programming simpler.  The above loop therefore is
# not part of the current draft.
#
# The first loop in this group of categories is one that identifies the
# different conformers, but if only one conformation is present this loop may
# be omitted unless one wished to give properties of the geometry as a whole
# such as the point group of the tecton. 
#
# Since in the TNT example the ideal geometries of the conformers differ only
# in the torsion angles, the remaining geometry of the molecule is
# common and need only to be given once.  This means that each conformer is
# described in part by items that give the common geometry and in part by
# items that give the distinctive geometry of the conformer (the torsion
# angles in this case).  Each distance or angle in the geom loops is assigned
# a conformer_label (e.g. aa, ab, all) to identify which conformer (or group
# of conformers) it describes.  The second loop in this group
# (tecton_conformer_equiv) associates each conformer_id with the appropriate
# conformer_labels.  
# Then follow the tecton_geom loops which define the atomic coordinates, the
# interatomic distances, the angles and the torsion angles. 
#
# Howard's draft gives the symmetry of the conformer using the dataname
# _tecton_geometric_class rather than _tecton_conformer_point_group.  He
# explains this items thus:
# "_tecton_geometric_class is the orientation-independent specification of the
# tecton point group according to its geometry. It's probably best to use a
# Schoenflies symbol as there is no choice of basis implicit in the geometry
# as given by interatomic distances, interatomic (dihedral) angles. I suppose
# that if the geometry is given as a set of coordinates, it might be
# justifiable to specify a point group with a H-M point group symbol for the
# tecton orientation corresponding to the atomic coordinates."
#
loop_
_tecton_conformer_id            # List-reference
_tecton_conformer_tecton_id     # Child of _tecton_topology_id
_tecton_conformer_point_group   # Schoenflies point group symbol of conformer
_tecton_conformer_chirality     # We need to define allowed symbols
_tecton_conformer_details
TNTaa   TNT Cs  achiral 'TNTaa conformer'
TNTbb   TNT Cs  achiral 'TNTbb conformer'
TNTab   TNT C1  +x      'TNTab conformer'
TNTba   TNT C1  -x      'TNTba conformer'

# The next loop associates each conformer with one or more of the
# conformer_labels used in the tecton_geom loops below. 
# The full geometry of a conformer can be found by extracting only those geom
# items marked with a label listed opposite that conformer. 

loop_
_tecton_conformer_equiv_id            # List-reference
_tecton_conformer_equiv_conformer_id  # Child of _tecton_conformer_id
_tecton_conformer_equiv_label         # Parent to                              
                                      #_tecton_geom_atom_conformer_label, etc.
1  TNTaa  all 
2  TNTaa  aa
3  TNTbb  all
4  TNTbb  bb 
5  TNTab  all
6  TNTab  ab 
7  TNTba  all 
8  TNTba  ba

# Next follow the tecton_geom_atom, dist, angle and torsion loops that
# give the geometry.
#
# All bonds and angles are defined in terms of child links
# from _tecton_topology_atom_id.  This uniquely links the geometry to the
# atoms in the bond graph (however there are problems in the second example
# CIF). 
#
# I have shortened some of Howard's names and the structure of Howard's CIF
# has been significantly changed to produce a consistent and simple
# description that obeys CIF rules.
#
# The first loop defines the atoms in terms of their coordinates.  This loop
# is not needed if coordinates are not used since it adds nothing to the atom
# properties given in the description of the topology (c.f. Sample CIF #2).
#
# _tecton_geom_atom_id is a child of _tecton_topology_atom_id and since it is
# unique in the geom list as well as the topology list it can also serve as
# the list-reference.
#
# By way of illustration the geometry of the benzene ring in this example is
# defined by atomic coordinates, but the remaining geometries are defined by
# their bonds and angles. 
#
# Note that since the benzene ring geometry is common to all conformers, it
# carries the conformer_label 'all' which is defined above as indicating a
# geometry that is common to all four conformers.
#

loop_
_tecton_geom_atom_id    # List-reference, child of _tecton_topology_atom_id
_tecton_geom_atom_conformer_label  # Child of _tecton_conformer_equiv_label
_tecton_geom_atom_coord_x          # Coordinates of atom in Angstrom
_tecton_geom_atom_coord_y          # 
_tecton_geom_atom_coord_z          # 
_tecton_geom_atom_details
T.C1  all 0.037  0.146  -0.124  ?
T.C2  all 1.378  0.562   0.134  ?
T.C3  all 1.846  1.421   0.204  ?
T.C4  all 2.567  1.834   0.304  ?
T.C5  all 1.745  1.563   0.245  ?
T.C6  all 0.962  0 498   0.103  ?
T.H3  all 2.13   1.72    0.24   ?    
T.H5  all 1.84   2.05    0.36   ?    

#
# Distances are defined in the next loop in terms of the two terminal atoms
# which are identified through children of _tecton_topology_atom_id.
#
# These distances do not have to correspond to the bonds
# defined in the topology.  They may represent non-bonding contacts or
# distances between atoms well removed from each other.
#
# These are notional (ideal) distances, not those observed in the crystal
#
loop_
_tecton_geom_dist_id              # List-reference
_tecton_geom_dist_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_dist_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_dist_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_dist_distance        # Distance atom1-atom2 in Angstroms
1  all   T.C7   T.C1    1.54                # TNT methyl group
2  all   T.C7   T.H71   1.05
3  all   T.C7   T.H72   1.05
4  all   T.C7   T.H73   1.05
5  all   T.N4   T.C4    1.43                # TNT N4 nitro group
6  all   T.N4   T.O41   1.18
7  all   T.N4   T.O42   1.18
8  all   T.N2   T.C2    1.43                # TNT N2 nitro group
9  all   T.N2   T.O21   1.18
10 all   T.N2   T.O22   1.18
11 all   T.N6   T.C6    1.43                # TNT N6 nitro group
12 all   T.N6   T.O61   1.18
13 all   T.N6   T.O62   1.18

# The angles are defined in terms of the atom_ids of three defining atoms. 
# The angle is given in degrees and is formed at atom2.
#
loop_
_tecton_geom_angle_id              # List-reference
_tecton_geom_angle_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_angle_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_angle_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_angle_atom3_is        # Child of _tecton_topology_atom_id
_tecton_geom_angle_angle           # Angle in degrees
1  all    T.C1   T.C7   T.H71  109     # TNT Methyl group
2  all    T.C1   T.C7   T.H72  109
3  all    T.C1   T.C7   T.H73  109
4  all    T.H71  T.C7   T.H72  109
5  all    T.H72  T.C7   T.H73  109
6  all    T.H73  T.C7   T.H71  109
7  all    T.O41  T.N4   T.C4   117     # TNT N4 nitro group
8  all    T.O42  T.N4   T.C4   117 
9  all    T.O41  T.N4   T.O42  126
10 all    T.O21  T.N2   T.C2   117     # TNT N2 nitro group
11 all    T.O22  T.N2   T.C2   117 
12 all    T.O21  T.N2   T.O22  126 
13 all    T.O61  T.N6   T.C6   117     # TNT N6 nitro group
14 all    T.O62  T.N6   T.C6   117 
15 all    T.O61  T.N6   T.O62  126 
#
# In the torsion angle loop given next the four conformers are
# differentiated.  
#
# One of the torsion angles is common to all conformers.  This is indicated by
# the value of tecton_geom_torsion_conformer_label having the value of 'all'
# as defined in the tecton_conformer_equiv loop.
#
# I have change Howard's 'dihedral' into 'torsion' to conform with usage in
# core_CIF.
#
loop_
_tecton_geom_torsion_id              # List-reference
_tecton_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_torsion_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom3_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom4_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_angle           # Torsion angle in degrees
1 all  T.C3   T.C4   T.N4   T.O41   90
2 aa   T.C1   T.C2   T.N2   T.O21   10.5
3 aa   T.C1   T.C6   T.N6   T.O61   10.5
4 bb   T.C1   T.C2   T.N2   T.O21  -10.5
5 bb   T.C1   T.C6   T.N6   T.O61  -10.5
6 ab   T.C1   T.C2   T.N2   T.O21   10.5
7 ab   T.C1   T.C6   T.N6   T.O61  -10.5
8 ba   T.C1   T.C2   T.N2   T.O21  -10.5
9 ba   T.C1   T.C6   T.N6   T.O61   10.5

############################################################
#           MAPPING THE TECTONS ONTO EACH OTHER
#
# The next loop maps the atoms of the subunit tectons onto the main tecton.
# It can of course map any tecton onto any other tecton provided the regions
# of the tectons mapped are isomorphic.  
#
# This mapping will not often be needed but is included to show how it can be
# done.
#
# It is only necessary to map the atoms, since there is no ambiguity
# about where the bonds occur as long as bond graph is finite.
#
# Howard remarks that:
# AT LEAST SOME OF THE FOLLOWING "MAPS" ARE NOT MAPS IN THE STRICT 
# MATHEMATICAL SENSE. MAYBE ANOTHER WORD IS NEEDED.
# (Any suggestons?)
#
# The following loop proposed by Howard is not needed.  It is implicit in the
# mapping of the atoms (see below).
#
# loop_
# _map_tecton2tecton_id           # List reference / Map Name
# _map_tecton2tecton_source_id    # Child of tecton_id
# _map_tecton2tecton_image_id     # Child of tecton_id     
# MAPsubnz2TNT         subnz   TNT
# MAPnitro2TNTnitroN2  nitro   TNT
# MAPnitro2TNTnitroN4  nitro   TNT
# MAPnitro2TNTnitroN6  nitro   TNT
#
# The changes made from Howard's draft of the next loop are: shortening and
# simplifying the name, replacing the first item (the map name - see above) by
# a list-reference (given here as a number).  The map name (which indicates
# which tectons are being mapped onto which) is implicit in the atom_ids (and
# in this example is made explicit to the reader by the use of children of
# _tecton_topology_atom_id to identify the atoms being mapped).
#
# This example is a true mapping in the sense described by Howard since each
# subunit is mapped onto the main molecule, though the syntax does not prevent
# the mapping being presented the other way around (the main molecule 'mapped'
# onto the subunits which violates the strict mathematical rules of mapping),
# unless it is made clear that atom1 is always mapped onto atom2.  
# I don't see how the mathematical rules can be expressed in machine-readable
# form in the dictionary, or even if we should insist on a mathematically
# correct definition.  Maybe 'map' is not the right term to use, but I cannot
# think of anything better.  It is more a kind of equivalencing or
# identification.
#
# This loop is now much simpler and very flexible.
#
# See the note above tecton_topology_bond re dangling bonds.
#
loop_
_map_tecton_atom_map_id        # List reference
_map_tecton_atom_atom1_id      # Child of _tecton_topology_atom_id
_map_tecton_atom_atom2_id      # Child of _tecton_topology_atom_id
1   B.C1    T.C1    # mapping 1,2,4,6 benzene moiety onto TNT
2   B.C2    T.C2
3   B.C3    T.C3
4   B.C4    T.C4
5   B.C5    T.C5
6   B.C6    T.C6
7   B.H3    T.H3
8   B.H5    T.H5
9   N.N1    T.N2    # mapping the nitro group onto the TNT N2 group
10  N.O1    T.O21
11  N.O2    T.O22
12  N.N1    T.N4    # mapping the nitro group onto the TNT N4 group
13  N.O1    T.O41
14  N.O2    T.O42
15  N.N1    T.N6    # mapping the nitro group onto the TNT N6 group
16  N.O1    T.O61
17  N.O2    T.O62
#
##############################################################
#          MAPPING THE TECTONS TO THE CRYSTAL 
#
# The next loop maps two of the conformers to the crystal (or vice versa, the
# strict mathematical definition of mapping does not work here).  
#
# As before I have changed Howard's draft by changing the first item from the
# map name to a numerical list-reference.  
#
# The atoms of the isomer are identified by _tecton_topology_atom_id, the
# conformer is identified by the conformer_label.  The conformer label may be
# omitted if it is not known which conformers are present in a disordered
# structure.
#
# I have arbitrarily assumed that the crystal contains equal numbers of the
# two Cs conformers, though I could as easily have included all four in
# various proportions.
#
# The occupation number indicates how much of each conformer (or isomer) is
# present. The occupation numbers of the atoms in the crystal are defined in
# the atom_site loop and must not be less than the sum of the corresponding
# occupation numbers of the conformers.  
#
# Columns 2 to 4 in the list give information about the atom in the tecton,
# Columns 5 and 6 identify the atom in the crystal, the first identifying
# the atom in the atom_site list, the second the symmetry operation applied to
# the coordinates of that atom.  
#
# The letters a and b distinguish the two positions of the disordered nitro
# groups in the crystallographic asymmetric unit, each having an occupancy of
# 0.5.
#
# I have assumed that the crystallographic mirror operation that relates the
# two halves of the TNT molecule has the _space_group_symop_id of 2.  Lattice
# translations of the symmetry operations are not needed and are therefore not
# included in this example (but see sample CIF #2).  
#
# As before, the following loop proposed by Howard is not needed as the
# information on the conformers is already define above in the
# tecton_conformer sections.
# loop_
# _map_tecton2crystal_id          # List reference / Map Name
# _map_tecton2crystal_tecton_id   # Child of tecton_id
#                                # CRYSTAL ID ????
#                                # OCCUPATION PARAMETER? 
# MAPTNTaa2crystal     TNTaa
# MAPTNTbb2crystal     TNTbb
# MAPTNTab2crystal     TNTab
# MAPTNTba2crystal     TNTba
# MAPTNT2crystal       TNT

loop_
_map_tecton2crystal_atom_id           # List-reference
_map_tecton2crystal_atom_atom_id      # Child of _tecton_topology_atom_id
_map_tecton2crystal_atom_conformer_label 
                                   # Child of _tecton_conformer_equiv_label
_map_tecton2crystal_atom_occup_number # Occupation number of tecton atom
_map_tecton2crystal_atom_atom_site_label # child of _atom_site_label
_map_tecton2crystal_atom_symop_id     # child of _space_group_symop_id
1  T.C1  all 1   C1   1
2  T.C2  all 1   C2   1
3  T.C3  all 1   C3   1
4  T.C4  all 1   C4   1
5  T.C5  all 1   C3   2
6  T.C6  all 1   C2   2
7  T.H3  all 1   H3   1
8  T.H5  all 1   H3   2
9  T.C7  all 1   C7   1
10 T.H71 all 1   H71  1
11 T.H72 all 1   H72  1
12 T.H73 all 1   H71  2
13 T.N4  all 1   N4   1
14 T.O41 all 1   O41  1
15 T.O42 all 1   O42  1
# SIDE CHAINS
16 T.N2  aa 0.5  N2a  1
17 T.O21 aa 0.5  O21a 1
18 T.O22 aa 0.5  O22a 1
19 T.N6  aa 0.5  N2a  2
20 T.O61 aa 0.5  O21a 2
21 T.O62 aa 0.5  O22a 2
22 T.N2  bb 0.5  N2b  1
23 T.O21 bb 0.5  O21b 1
24 T.O22 bb 0.5  O22b 1
25 T.N6  bb 0.5  N2b  2
26 T.O61 bb 0.5  O21b 2
27 T.O62 bb 0.5  O22b 2
#
# Howard remarks: IF THE SIDE CHAINS WERE LONGER, EVEN THE ABOVE SYNTAX WOULD
# BE TOO CUMBERSOME.
# IDB: But there is no duplication - the list could not be shorter.
#
############ End of first CIF ################

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                        4.2 SECOND SAMPLE CIF

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

######### Beginning of second CIF #############
#
#
# EXAMPLE OF A STRUCTURE WITH AN INFINITE BOND GRAPH
#
# CaCrF5 is chosen to illustrate how infinite bond graphs are treated. 
#
# CaCrF5 consists of chains of corner-linked CrF6 octahedra running along the
# c axis of a crystal belonging to space group C2/c.  The Cr and the linking F
# atom (F3) reside on 2-fold axes that are perpendicular to c.  The Ca atoms
# lie between the chains on the same 2-fold axes. 
# 
# The crystal structure of CaCrF5 is represented by an array of atoms linked
# by bonds into an infinitely connected network with translational symmetry. 
# A finite graph, which retains all the local properties of the atoms, can be
# extracted from the infinite graph as follows: 
# One first extracts one formula unit (in this case the seven atoms in the
# chemical formula).  This requires that fourteen bonds linking the formula
# unit to the rest of the infinite network be broken, but such broken bonds
# always occur in pairs since they are necessarily related in pairs by one of
# the translational symmetry operations of the space group (translations,
# glides or screws).  The broken bonds of each pair are then connected to each
# other, adding (in this case) seven further bonds to the finite bond graph. 
# Therefore in some cases a pair of atoms in the finite graph may be linked
# by more than one bond.  This is indicated in the graph by a double or triple
# line, etc.  In CaCrF5 three such pairs of atoms are linked by two bonds as
# shown in the bond graph below.  The inclusion of two lines between a pair of
# atoms in the graph does NOT indicate a double bond (a bond of order 2), but
# rather two different bonds whose bond order is not specified.  Where two (or
# more) bonds are shown as linking the same two atoms in the finite graph,
# they connect two different pairs of atoms in the infinite graph and in the
# crystal structure as can be seen from the map_tecton2crystal_bond loop.  
#
# Information on the long-range order is lost when the infinite graph is
# reduced to a finite graph, but the short-range order, i.e., the nearest
# neighbour environments that contains the chemical bonds, is preserved. 
# A crude representation of the finite graph showing the six bonds between Cr
# and F, and the seven bonds between Ca and F, is given below.  In the crystal
# F1 and F4 are related by a crystallographic 2-fold axis, as are F2 and F5.
#
#           |------------ F2 -------------|
#           |                             |
#           |------------ F1 =============|
#           |                             |
#      Cr1 -|============ F3 -------------|- Ca
#           |                             |
#           |------------ F4 =============|
#           |                             |
#           |------------ F5 -------------|
#
#
data_Ca_Cr_F5
#
# In this example a complete CIF is given including the atomic coordinates and
# the symmetry operations.  The description of the tecton is followed by the
# mapping between the tecton and the crystal structure.  As there is only one
# conformer, the conformer loops are not used.
#
###############################################################
#      DEFINITION OF THE CRYSTALLOGRAPHIC STRUCTURE
#
#   Based on Wu and Brown (1973) Mat. Res. Bull. 8, 593-8.
#
_chemical_formula_sum   'Ca Cr F5'
_cell_length_a                      9.0050
_cell_length_b                      6.4720
_cell_length_c                      7.5330
_cell_angle_alpha                    90.00
_cell_angle_beta                    115.85
_cell_angle_gamma                    90.00
_cell_formula_units_Z                 8
_space_group_name_H-M_alt           'C 2/c'
_space_group_name_Hall              '-C 2yc'
loop_
         _space_group_symop_id
         _space_group_symop_operation_xyz
1         ' X, Y, Z'
2         '-X, Y,-Z+1/2'
3         '-X,-Y,-Z'
4         ' X,-Y, Z+1/2'
5         ' X+1/2, Y+1/2, Z'
6         '-X+1/2, Y+1/2,-Z+1/2'
7         '-X+1/2,-Y+1/2,-Z'
8         ' X+1/2,-Y+1/2, Z+1/2'

loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_U_iso_or_equiv
_atom_site_adp_type
Ca1      0.50000   0.04260   0.25000   0.10000  Uiso
Cr1      0.00000   0.00000   0.00000   0.10000  Uiso
F1       0.00970  -0.29340  -0.02910   0.10000  Uiso
F2      -0.22730  -0.02300  -0.11740   0.10000  Uiso
F3       0.00000  -0.07210   0.25000   0.10000  Uiso
#
loop_
 _geom_bond_atom_site_label_1
 _geom_bond_atom_site_label_2
 _geom_bond_distance
 _geom_bond_site_symmetry_1
 _geom_bond_site_symmetry_2
Ca1   F1    2.391   1_555   5_555
Ca1   F1    2.391   1_555   6_555
Ca1   F1    2.292   1_555   7_545
Ca1   F1    2.292   1_555   8_545
Ca1   F2    2.215   1_555   3_555
Ca1   F2    2.215   1_555   4_655
Ca1   F3    2.494   1_555   5_555
Cr1   F1    1.918   1_555   1_555
Cr1   F1    1.918   1_555   3_555
Cr1   F2    1.848   1_555   1_555
Cr1   F2    1.848   1_555   3_555
Cr1   F3    1.940   1_555   1_555
Cr1   F3    1.940   1_555   3_555
#
#############################################################
#       DEFINITION OF THE FORMULA UNIT AS A TECTON
#
# The next loop lists the tectons, in this case the only tecton defined
# contains one formula unit.
#
loop_
_tecton_topology_id          # List reference 
_tecton_topology_formula
_tecton_topology_special_details
1 'Ca Cr F5' 'The formula unit'
#
# The next loop lists the seven atoms that compose the tecton and
# gives their chemical properties.  Note that the atom_site list in the
# crystallographic items given above only contains five atoms because the
# molecular unit occupies two asymmetric units and two F atoms are duplicated
# by the two-fold axis.
#
# _tecton_topology_atom_id is the list-reference and as in the previous
# example has been constructed from the atom_label. The tecton number is not
# needed here because the atom_label is sufficient to ensure uniqueness. 
# _tecton_topology_atom_tecton_id is next shown.  Although there is only one
# tecton, this item is needed to link the atoms to the formula in the
# tecton_topology loop above. 
# _tecton_topology_atom_label is included for the benefit of the user.  It has
# no parent or child and is not required for CIF management.  The CIF
# identifies the atom by _tecton_topology_atom_id.
# _tecton_topology_atom_valence is the atomic valence as used in the bond
# valence model, a model which is included here because it is based on the
# topological properties of the bond network.
# _tecton_topology_atom_coord_number is the number of bonds in the bond graph
# that terminate on the atom.  
#
loop_
_tecton_topology_atom_id            # List-reference
_tecton_topology_atom_tecton_id     # Child of _tecton_topology_id
_tecton_topology_atom_label            
_tecton_topology_atom_type_symbol   # Child of _atom_type_symbol
_tecton_topology_atom_valence
_tecton_topology_atom_coord_number  # Number of bonds formed by this atom
_tecton_topology_atom_details
Ca 1 Ca1 Ca  2 7  ?
Cr 1 Cr1 Cr  3 6  ?
F1 1 F1  F  -1 3  ?
F2 1 F2  F  -1 2  ?
F3 1 F3  F  -1 3  ?
F4 1 F4  F  -1 3  ' Related to F1 by crystallographic symmetry'
F5 1 F5  F  -1 2  ' Related to F2 by crystallographic symmetry'
#
# The next loop lists the bonds in the tecton.  Some bonds appear
# twice (e.g. Cr.F3.1 and Cr.F3.2).  The atoms of the tecton
# specified in these cases (e.g., atoms Cr and F3) map onto
# different atom pairs in the crystal as can be seen the
# map_tecton2crystal_bond loop below.
#
# _tecton_topology_bond_id is the list-reference and in this example has been
# constructed from the ids of the two atoms that form the bond since it is
# parent to _map_tecton2crystal_bond_id.
# 
# _tecton_topology_bond_valence is a quantity determined from the topology and
# is used to calculate (in this case) the ideal bond lengths given in the
# tecton_geom_bond loop.
#
loop_
_tecton_topology_bond_id        # list reference
_tecton_topology_bond_atom_id_1 # Child of _tecton_topology_atom_id
_tecton_topology_bond_atom_id_2 # Child of _tecton_topology_atom_id
_tecton_topology_bond_valence   # Predicted bond valence
_tecton_topology_bond_type
Cr.F1    Cr F1 0.48  ?
Cr.F4    Cr F4 0.48  ?
Cr.F2    Cr F2 0.61  ?
Cr.F5    Cr F5 0.61  ?
Cr.F3.1  Cr F3 0.41  ?
Cr.F3.2  Cr F3 0.41  ?
Ca.F1.1  Ca F1 0.26  ?
Ca.F1.2  Ca F1 0.26  ?
Ca.F4.1  Ca F4 0.26  ?
Ca.F4.2  Ca F4 0.26  ?
Ca.F2    Ca F2 0.39  ?
Ca.F5    Ca F5 0.39  ?
Ca.F3    Ca F3 0.18  ?
#
############################################################
#              DEFINITION OF THE TECTON GEOMETRY 
#
# The tecton_geom_atom loop is omitted as the geometry is not defined here in
# terms of ideal atomic coordinates.
#
# The ideal interatomic distances are next given.
# They can be compared to the observed distances given in the
# crystallographic _geom_bond list above.  
#
# The distances defined in this list do not need to be the same as the bonds
# defined in the tecton_topology_bond list; some distances given here may be
# between non-bonded atoms.  
#
# Since the list-reference is not parent to any other item, an alphabetical
# sequence of letters has been chosen (to show that any string is valid).
#
# The distances are defined by the second and third items in the loop.
#
# The bond valence which is used to calculate these distances is identical to
# the value given in tecton_topology_bond_valence so we may not need this item
# here. 
##
## NOTE: There is a weakness in the present definition that requires more
## thought.  As presently defined, the geometry is given only for the atoms in
## the finite bond graph, i.e. that atoms in the tecton from which the
## infinite crystal is generated, but there will be occasions when it is
## necessary to give distances and angles between atoms that span more than
## one tecton.  This could be done by specifying the ideal geometry using the
## atoms in the crystal or alternatively by introducing the translational
## symmetry operations between tectons that are needed to specify the long
## range order.  The choice of which method to use requires more thought -
## it is deferred to a later draft.
##
#
loop_
_tecton_geom_dist_id            # List-reference
_tecton_geom_dist_atom1_id      # Child of _tecton_topology_atom_id
_tecton_geom_dist_atom2_id      # Child of _tecton_topology_atom_id
_tecton_geom dist_distance      # Ideal bond distance in Angstroms
_tecton_geom_dist_valence           # Same as _tecton_topology_bond_valence
_tecton_geom_dist_details
A  Cr F1 1.93  0.48  'Bond distances calculated from bond valences'
B  Cr F4 1.93  0.48  'Bond distances calculated from bond valences'
C  Cr F2 1.84  0.61  'Bond distances calculated from bond valences'
D  Cr F5 1.84  0.61  'Bond distances calculated from bond valences'
E  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
F  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
G  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
H  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
I  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
J  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
K  Ca F2 2.19  0.39  'Bond distances calculated from bond valences'
L  Ca F5 2.19  0.39  'Bond distances calculated from bond valences'
M  Ca F3 2.48  0.18  'Bond distances calculated from bond valences'

# Similar angle and torsion loops could also be given but are omitted here for
# brevity.  Torsion angles are rarely used to define the geometry of inorganic
# compounds.
#
############################################################
#       MAPPING THE TOPOLOGY ONTO THE CRYSTAL
#
# The next loop maps the atoms of the tecton onto the atoms of the
# crystal.  
#
# Note that atoms F4 and F5 in the molecular unit map onto
# symmetry-generated copies of F1 and F2 in the crystal.
#
# The additional translation components of the symmetry operation are included
# here by way of illustration, though strictly only necessary in the bond
# loop.
#
loop_
_map_tecton2crystal_atom_id              # List reference
_map_tecton2crystal_atom_atom_id         # Child of _tecton_topology_atom_id
_map_tecton2crystal_atom_atom_site_label # Child of _atom_site_label
_map_tecton2crystal_atom_symop_id        # Child of _space_group_symop_id
_map_tecton2crystal_atom_trans_x
_map_tecton2crystal_atom_trans_y
_map_tecton2crystal_atom_trans_z
1   Ca  Ca1 1 0 0 0
2   Cr  Cr1 1 0 0 0
3   F1  F1  1 0 0 0
4   F2  F2  1 0 0 0
5   F3  F3  1 0 0 0
6   F4  F1  3 0 0 0
7   F5  F2  3 0 0 0
#
# The next loop maps the bonds from the tecton onto the crystal.  This loop
# is only needed for infinitely connected structures because these are the
# only graphs in which there can be more than one bond between the same
# pair of atoms in the tecton.  
#
# Since this loop maps the bonds of the tecton directly onto pairs of atoms in
# the crystal, the tecton bond is sufficiently defined by
# _tecton_topology_bond_id, but the bond in the crystal must be defined fully
# in terms of two atom_site_labels and their corresponding symmetry operations
# including the additional lattice translations.  It should be sufficient to
# use _map_tecton2crystal_bond_bond_id as the list-reference (though I have
# not done this here.
#
# Note that bonds 5 and 6 map onto different pairs of atoms in the crystal. 
#
# The bonds I have labelled 'link' are those that link the atoms in the
# tecton (the atoms that form the finite bond graph) to the atoms in
# symmetry-related tectons in the infinite graph.  The remaining bonds are
# those formed between the atoms within the tecton.
#
loop_
_map_tecton2crystal_bond_id                # List reference
_map_tecton2crystal_bond_bond_id           # Child of _tecton_topology_bond_id
_map_tecton2crystal_bond_atom_site_label_1 # Child of _atom_site_label
_map_tecton2crystal_bond_symop_1           # Child of _space_group_symop_id 
_map_tecton2crystal_bond_trans_x_1
_map_tecton2crystal_bond_trans_y_1
_map_tecton2crystal_bond_trans_z_1
_map_tecton2crystal_bond_atom_site_label_2 # Child of _atom_site_label
_map_tecton2crystal_bond_symop_2           # Child of _space_group_symop_id 
_map_tecton2crystal_bond_trans_x_2
_map_tecton2crystal_bond_trans_y_2
_map_tecton2crystal_bond_trans_z_2
_map_tecton2crystal_bond_dist               # Observed distance (optional)
_map_tecton2crystal_bond_details
1  Cr.F1       Cr1 1 0 0 0  F1 1 0 0 0    1.918   ?
2  Cr.F4       Cr1 1 0 0 0  F4 1 0 0 0    1.918   ?
3  Cr.F2       Cr1 1 0 0 0  F2 1 0 0 0    1.848   ?
4  Cr.F5       Cr1 1 0 0 0  F5 1 0 0 0    1.848   ?
5  Cr.F3.1     Cr1 1 0 0 0  F3 1 0 0 0    1.940   ?
6  Cr.F3.2     Cr1 1 0 0 0  F3 3 0 0 0    1.940   link

7  Ca.F1.1     Ca1 1 0 0 0  F1 5 0 0 0    2.391   link
8  Ca.F1.2     Ca1 1 0 0 0  F1 6 0 0 0    2.292   link
9  Ca.F4.1     Ca1 1 0 0 0  F4 5 0 -1 0   2.391   link
10 Ca.F4.2     Ca1 1 0 0 0  F4 6 0 -1 0   2.292   link
11 Ca.F2       Ca1 1 0 0 0  F5 1 0 0 0    2.215   ?
12 Ca.F5       Ca1 1 0 0 0  F2 4 1 0 0    2.215   link
13 Ca.F3       Ca1 1 0 0 0  F3 5 0 0 0    2.494   link
#
################# End of second CIF ####################

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

       4. COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

mmCIF has a chemical description which is designed for biological molecules. 
The contents of the crystal are divided into a small number of ENTITIES which
are classified as either polymers (e.g. a protein molecule), non-polymers, or
water.  A category called struct_asym describes which entities are found in
the asymmetric unit. 

Polymeric entities are typically composed of monomers or COMPONENTS which are
described in the category CHEM_COMP.  The definitions in this set of
categories are very similar to our definitions in the tecton_topology and
tecton_geom categories.  Chem_comp is designed to give the contents and
geometries of the individual monomers that compose the macromolecules.  It
describes the ideal geometry of the monomers either in terms of Cartesian
coordinates or in terms of bond lengths and angles.  Unlike our proposal which
uses _map_tecton2crystal_ to map the molecular units onto the crystal
structure, the atom_site loop itself contains pointers to the corresponding
atom in chem_comp, an arrangement that does not work for small molecules where
an atom listed in the atom_site loop (asymmetric unit) may map onto more than
one atom in the tecton e.g., if the tecton contains crystallographic symmetry
as in the case of CaCrF5 above.  

We should make the definitions of items in the tecton_topology and tecton_geom
categories correspond exactly to those used in chem_comp to allow direct
translation between the two categories (this should not be a problem). 
chem_comp defines a very large number of additional properties such as the
chirality of individual atoms and planes of atoms, as well as properties that
are of interest only in biological structures.  We may wish to add some of
these to our lists.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

             5. SAMPLE CIFS WITH COMMENTS REMOVED

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

data_disordered_TNT

loop_
_tecton_topology_id          # List-reference 
_tecton_topology_name        # Name e.g. full IUPAC name  
_tecton_topology_formula
_tecton_topology_Zprime      # Number of symmetry independent 
                                 # copies of the tecton in the crystal
_tecton_topology_special_details
TNT   '2,4,6 trinitrotoluene'     'C7 H5 N3 O6' 1  molecule
BNZ   '1,2,4,6 benzene ring'      'C6 H2'       1  moiety
NITRO 'nitro group'               'N O2'        2  group

loop_
_tecton_topology_atom_id            # List-reference
_tecton_topology_atom_tecton_id     # Child of _tecton_id
_tecton_topology_atom_label            
_tecton_topology_atom_type_symbol   # Child of _atom_type_symbol
_tecton_topology_atom_coord_number  # Number of bonds formed by this atom
_tecton_topology_atom_chirality
_tecton_topology_atom_details
T.C1  TNT     C1   C     3  . ?
T.C2  TNT     C2   C     3  . ?
T.C3  TNT     C3   C     3  . ?
T.C4  TNT     C4   C     3  . ?
T.C5  TNT     C5   C     3  . ?
T.C6  TNT     C6   C     3  . ?
T.C7  TNT     C7   C     4  . ?
T.H3  TNT     H3   H     1  . ?
T.H5  TNT     H5   H     1  . ?
T.H71 TNT     H71  H     1  . ?
T.H72 TNT     H72  H     1  . ?
T.H73 TNT     H73  H     1  . ?
T.N2  TNT     N2   N     3  . ?
T.O21 TNT     O21  O     1  . ?
T.O22 TNT     O22  O     1  . ?
T.N4  TNT     N4   N     3  . ?
T.O41 TNT     O41  O     1  . ?
T.O42 TNT     O42  O     1  . ?
T.N6  TNT     N6   N     3  . ?
T.O61 TNT     O61  O     1  . ?
T.O62 TNT     O62  O     1  . ?
B.C1  BNZ     C1   C     3  . 'benzene ring'
B.C2  BNZ     C2   C     3  . 'benzene ring'
B.C3  BNZ     C3   C     3  . 'benzene ring'
B.C4  BNZ     C4   C     3  . 'benzene ring'
B.C5  BNZ     C5   C     3  . 'benzene ring'
B.C6  BNZ     C6   C     3  . 'benzene ring'
B.H3  BNZ     H3   C     1  . 'benzene ring'
B.H5  BNZ     H5   H     1  . 'benzene ring'
N.N1  NITRO   N1   N     3  . 'nitro group'
N.O1  NITRO   O1   O     1  . 'nitro group'
N.O2  NITRO   O2   O     1  . 'nitro group'

loop_
_tecton_topology_bond_id           # List-reference
_tecton_topology_bond_atom1_id     # Child of _tecton_topology_atom_id
_tecton_topology_bond_atom2_id     # Child of _tecton_topology_atom_id
_tecton_topology_bond_type
1     T.C1   T.C2   delocalized     # TNT benzene ring
2     T.C2   T.C3   delocalized
3     T.C3   T.C4   delocalized
4     T.C4   T.C5   delocalized
5     T.C5   T.C6   delocalized
6     T.C6   T.C1   delocalized
7     T.C3   T.H3   single
8     T.C5   T.H5   single
9     T.C7   T.C1   single          # TNT Methyl group
10    T.C7   T.H71  single
11    T.C7   T.H72  single
12    T.C7   T.H73  single
13    T.N2   T.C2   single          # TNT N2 nitro group
14    T.N2   T.O21  delocalized
15    T.N2   T.O22  delocalized
16    T.N4   T.C4   single          # TNT N4 nitro group
17    T.N4   T.O41  delocalized
18    T.N4   T.O42  delocalized
19    T.N6   T.C6   single          # TNT N6 nitro group
20    T.N6   T.O61  delocalized
21    T.N6   T.O62  delocalized
22    B.C1   B.C2   delocalized     # 1,2,4,6 substituted benzene ring
23    B.C2   B.C3   delocalized
24    B.C3   B.C4   delocalized
25    B.C4   B.C5   delocalized
26    B.C5   B.C6   delocalized
27    B.C6   B.C1   delocalized
28    B.C3   B.H3   single
29    B.C5   B.H5   single
30    B.C1   .      single          # A dangling bond
31    B.C2   .      single
32    B.C4   .      single
33    B.C6   .      single
34    N.N1   N.O1   delocalized     # Nitro group
35    N.N1   N.O2   delocalized
36    N.N1   .      single

loop_
_tecton_conformer_id            # List-reference
_tecton_conformer_tecton_id     # Child of _tecton_topology_id
_tecton_conformer_point_group   # Schoenflies point group symbol of conformer
_tecton_conformer_chirality      
_tecton_conformer_details
TNTaa   TNT Cs  achiral 'TNTaa conformer'
TNTbb   TNT Cs  achiral 'TNTbb conformer'
TNTab   TNT C1  +x      'TNTab conformer'
TNTba   TNT C1  -x      'TNTba conformer'

loop_
_tecton_conformer_equiv_id      # List-reference
_tecton_conformer_equiv_conformer_id  # Child of _tecton_conformer_id
_tecton_conformer_equiv_label   # Parent to _tecton_geom_atom_conformer_label 
1  TNTaa  all 
2  TNTaa  aa
3  TNTbb  all
4  TNTbb  bb 
5  TNTab  all
6  TNTab  ab 
7  TNTba  all 
8  TNTba  ba

loop_
_tecton_geom_atom_id    # List-reference, child of _tecton_topology_atom_id
_tecton_geom_atom_conformer_label  # Child of _tecton_conformer_equiv_label
_tecton_geom_atom_coord_x          # Coordinates of atom in Angstrom
_tecton_geom_atom_coord_y          # 
_tecton_geom_atom_coord_z          # 
_tecton_geom_atom_details
T.C1  all 0.037  0.146  -0.124  ?
T.C2  all 1.378  0.562   0.134  ?
T.C3  all 1.846  1.421   0.204  ?
T.C4  all 2.567  1.834   0.304  ?
T.C5  all 1.745  1.563   0.245  ?
T.C6  all 0.962  0 498   0.103  ?
T.H3  all 2.13   1.72    0.24   ?    
T.H5  all 1.84   2.05    0.36   ?    

loop_
_tecton_geom_dist_id              # List-reference
_tecton_geom_dist_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_dist_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_dist_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_dist_distance        # Distance atom1-atom2 in Angstroms
1  all   T.C7   T.C1    1.54                # TNT methyl group
2  all   T.C7   T.H71   1.05
3  all   T.C7   T.H72   1.05
4  all   T.C7   T.H73   1.05
5  all   T.N4   T.C4    1.43                # TNT N4 nitro group
6  all   T.N4   T.O41   1.18
7  all   T.N4   T.O42   1.18
8  all   T.N2   T.C2    1.43                # TNT N2 nitro group
9  all   T.N2   T.O21   1.18
10 all   T.N2   T.O22   1.18
11 all   T.N6   T.C6    1.43                # TNT N6 nitro group
12 all   T.N6   T.O61   1.18
13 all   T.N6   T.O62   1.18

loop_
_tecton_geom_angle_id              # List-reference
_tecton_geom_angle_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_angle_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_angle_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_angle_atom3_is        # Child of _tecton_topology_atom_id
_tecton_geom_angle_angle           # Angle in degrees
1  all    T.C1   T.C7   T.H71  109     # TNT Methyl group
2  all    T.C1   T.C7   T.H72  109
3  all    T.C1   T.C7   T.H73  109
4  all    T.H71  T.C7   T.H72  109
5  all    T.H72  T.C7   T.H73  109
6  all    T.H73  T.C7   T.H71  109
7  all    T.O41  T.N4   T.C4   117     # TNT N4 nitro group
8  all    T.O42  T.N4   T.C4   117 
9  all    T.O41  T.N4   T.O42  126
10 all    T.O21  T.N2   T.C2   117     # TNT N2 nitro group
11 all    T.O22  T.N2   T.C2   117 
12 all    T.O21  T.N2   T.O22  126 
13 all    T.O61  T.N6   T.C6   117     # TNT N6 nitro group
14 all    T.O62  T.N6   T.C6   117 
15 all    T.O61  T.N6   T.O62  126 

loop_
_tecton_geom_torsion_id              # List-reference
_tecton_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label
_tecton_geom_torsion_atom1_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom2_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom3_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_atom4_id        # Child of _tecton_topology_atom_id
_tecton_geom_torsion_angle           # Torsion angle in degrees
1 all  T.C3   T.C4   T.N4   T.O41   90
2 aa   T.C1   T.C2   T.N2   T.O21   10.5
3 aa   T.C1   T.C6   T.N6   T.O61   10.5
4 bb   T.C1   T.C2   T.N2   T.O21  -10.5
5 bb   T.C1   T.C6   T.N6   T.O61  -10.5
6 ab   T.C1   T.C2   T.N2   T.O21   10.5
7 ab   T.C1   T.C6   T.N6   T.O61  -10.5
8 ba   T.C1   T.C2   T.N2   T.O21  -10.5
9 ba   T.C1   T.C6   T.N6   T.O61   10.5

loop_
_map_tecton_atom_map_id        # List-reference
_map_tecton_atom_atom1_id      # Child of _tecton_topology_atom_id
_map_tecton_atom_atom2_id      # Child of _tecton_topology_atom_id
1   B.C1    T.C1    # mapping 1,2,4,6 benzene moiety onto TNT
2   B.C2    T.C2
3   B.C3    T.C3
4   B.C4    T.C4
5   B.C5    T.C5
6   B.C6    T.C6
7   B.H3    T.H3
8   B.H5    T.H5
9   N.N1    T.N2    # mapping the nitro group onto the TNT N2 group
10  N.O1    T.O21
11  N.O2    T.O22
12  N.N1    T.N4    # mapping the nitro group onto the TNT N4 group
13  N.O1    T.O41
14  N.O2    T.O42
15  N.N1    T.N6    # mapping the nitro group onto the TNT N6 group
16  N.O1    T.O61
17  N.O2    T.O62

loop_
_map_tecton2crystal_atom_id           # List-reference
_map_tecton2crystal_atom_atom_id      # Child of _tecton_topology_atom_id
_map_tecton2crystal_atom_tecton_label # Child of _tecton_conformer_equiv_label
_map_tecton2crystal_atom_occup_number # Occupation number of tecton atom
_map_tecton2crystal_atom_atom_site_label # child of _atom_site_label
_map_tecton2crystal_atom_symop_id     # child of _space_group_symop_id
1  T.C1  all 1   C1   1
2  T.C2  all 1   C2   1
3  T.C3  all 1   C3   1
4  T.C4  all 1   C4   1
5  T.C5  all 1   C3   2
6  T.C6  all 1   C2   2
7  T.H3  all 1   H3   1
8  T.H5  all 1   H3   2
9  T.C7  all 1   C7   1
10 T.H71 all 1   H71  1
11 T.H72 all 1   H72  1
12 T.H73 all 1   H71  2
13 T.N4  all 1   N4   1
14 T.O41 all 1   O41  1
15 T.O42 all 1   O42  1
16 T.N2  aa 0.5  N2a  1
17 T.O21 aa 0.5  O21a 1
18 T.O22 aa 0.5  O22a 1
19 T.N6  aa 0.5  N2a  2
20 T.O61 aa 0.5  O21a 2
21 T.O62 aa 0.5  O22a 2
22 T.N2  bb 0.5  N2b  1
23 T.O21 bb 0.5  O21b 1
24 T.O22 bb 0.5  O22b 1
25 T.N6  bb 0.5  N2b  2
26 T.O61 bb 0.5  O21b 2
27 T.O62 bb 0.5  O22b 2

############ End of first CIF ################
#
############ Start of second CIF #############

data_Ca_Cr_F5

_chemical_formula_sum   'Ca Cr F5'
_cell_length_a                      9.0050
_cell_length_b                      6.4720
_cell_length_c                      7.5330
_cell_angle_alpha                    90.00
_cell_angle_beta                    115.85
_cell_angle_gamma                    90.00
_cell_formula_units_Z                 8
_space_group_name_H-M_alt           'C 2/c'
_space_group_name_Hall              '-C 2yc'
loop_
         _space_group_symop_id
         _space_group_symop_operation_xyz
1         ' X, Y, Z'
2         '-X, Y,-Z+1/2'
3         '-X,-Y,-Z'
4         ' X,-Y, Z+1/2'
5         ' X+1/2, Y+1/2, Z'
6         '-X+1/2, Y+1/2,-Z+1/2'
7         '-X+1/2,-Y+1/2,-Z'
8         ' X+1/2,-Y+1/2, Z+1/2'

loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_U_iso_or_equiv
_atom_site_adp_type
Ca1      0.50000   0.04260   0.25000   0.10000  Uiso
Cr1      0.00000   0.00000   0.00000   0.10000  Uiso
F1       0.00970  -0.29340  -0.02910   0.10000  Uiso
F2      -0.22730  -0.02300  -0.11740   0.10000  Uiso
F3       0.00000  -0.07210   0.25000   0.10000  Uiso
#
loop_
 _geom_bond_atom_site_label_1
 _geom_bond_atom_site_label_2
 _geom_bond_distance
 _geom_bond_site_symmetry_1
 _geom_bond_site_symmetry_2
Ca1   F1    2.391   1_555   5_555
Ca1   F1    2.391   1_555   6_555
Ca1   F1    2.292   1_555   7_545
Ca1   F1    2.292   1_555   8_545
Ca1   F2    2.215   1_555   3_555
Ca1   F2    2.215   1_555   4_655
Ca1   F3    2.494   1_555   5_555
Cr1   F1    1.918   1_555   1_555
Cr1   F1    1.918   1_555   3_555
Cr1   F2    1.848   1_555   1_555
Cr1   F2    1.848   1_555   3_555
Cr1   F3    1.940   1_555   1_555
Cr1   F3    1.940   1_555   3_555

loop_
_tecton_topology_id          # List-reference 
_tecton_topology_formula
_tecton_topology_special_details
1 'Ca Cr F5' 'The formula unit'

loop_
_tecton_topology_atom_id            # List-reference
_tecton_topology_atom_tecton_id     # Child of _tecton_topology_id
_tecton_topology_atom_label            
_tecton_topology_atom_type_symbol   # Child of _atom_type_symbol
_tecton_topology_atom_valence       # Formal oxidation state
_tecton_topology_atom_coord_number  # Number of bonds formed by this atom
_tecton_topology_atom_details
Ca 1 Ca1 Ca  2 7  ?
Cr 1 Cr1 Cr  3 6  ?
F1 1 F1  F  -1 3  ?
F2 1 F2  F  -1 2  ?
F3 1 F3  F  -1 3  ?
F4 1 F4  F  -1 3  ' Related to F1 by crystallographic symmetry'
F5 1 F5  F  -1 2  ' Related to F2 by crystallographic symmetry'

loop_
_tecton_topology_bond_id        # list-reference
_tecton_topology_bond_atom_id_1 # Child of _tecton_atom_id
_tecton_topology_bond_atom_id_2 # Child of _tecton_atom_id
_tecton_topology_bond_valence   # Predicted bond valence
_tecton_topology_bond_type
Cr.F1    Cr F1 0.48  ?
Cr.F4    Cr F4 0.48  ?
Cr.F2    Cr F2 0.61  ?
Cr.F5    Cr F5 0.61  ?
Cr.F3.1  Cr F3 0.41  ?
Cr.F3.2  Cr F3 0.41  ?
Ca.F1.1  Ca F1 0.26  ?
Ca.F1.2  Ca F1 0.26  ?
Ca.F4.1  Ca F4 0.26  ?
Ca.F4.2  Ca F4 0.26  ?
Ca.F2    Ca F2 0.39  ?
Ca.F5    Ca F5 0.39  ?
Ca.F3    Ca F3 0.18  ?

loop_
_tecton_geom_dist_id            # List-reference
_tecton_geom_dist_atom1_id      # Child of _tecton_topology_atom_id
_tecton_geom_dist_atom2_id      # Child of _tecton_topology_atom_id
_tecton_geom dist_distance      # Ideal bond distance in Angstroms
_tecton_geom_dist_valence           # Bond valence corresponding to dist.
_tecton_geom_dist_details
A  Cr F1 1.93  0.48  'Bond distances calculated from bond valences'
B  Cr F4 1.93  0.48  'Bond distances calculated from bond valences'
C  Cr F2 1.84  0.61  'Bond distances calculated from bond valences'
D  Cr F5 1.84  0.61  'Bond distances calculated from bond valences'
E  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
F  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
G  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
H  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
I  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
J  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
K  Ca F2 2.19  0.39  'Bond distances calculated from bond valences'
L  Ca F5 2.19  0.39  'Bond distances calculated from bond valences'
M  Ca F3 2.48  0.18  'Bond distances calculated from bond valences'

loop_
_map_tecton2crystal_atom_id              # List-reference
_map_tecton2crystal_atom_atom_id         # Child of _tecton_topology_atom_id
_map_tecton2crystal_atom_atom_site_label # child of _atom_site_label
_map_tecton2crystal_atom_symop_id        # child of _space_group_symop_id
_map_tecton2crystal_atom_trans_x
_map_tecton2crystal_atom_trans_y
_map_tecton2crystal_atom_trans_z
1   Ca  Ca1 1 0 0 0
2   Cr  Cr1 1 0 0 0
3   F1  F1  1 0 0 0
4   F2  F2  1 0 0 0
5   F3  F3  1 0 0 0
6   F4  F1  3 0 0 0
7   F5  F2  3 0 0 0

loop_
_map_tecton2crystal_bond_id                # List-reference
_map_tecton2crystal_bond_bond_id           # Child of _tecton_topology_bond_id
_map_tecton2crystal_bond_atom_site_label_1 # Child of _atom_site_label
_map_tecton2crystal_bond_symop_1           # Child of _space_group_symop_id 
_map_tecton2crystal_bond_trans_x_1
_map_tecton2crystal_bond_trans_y_1
_map_tecton2crystal_bond_trans_z_1
_map_tecton2crystal_bond_atom_site_label_2 # Child of _atom_site_label
_map_tecton2crystal_bond_symop_2           # Child of _space_group_symop_id 
_map_tecton2crystal_bond_trans_x_2
_map_tecton2crystal_bond_trans_y_2
_map_tecton2crystal_bond_trans_z_2
_map_tecton2crystal_bond_dist               # Observed distance (optional)
_map_tecton2crystal_bond_details
1  Cr.F1       Cr1 1 0 0 0  F1 1 0 0 0    1.918   ?
2  Cr.F4       Cr1 1 0 0 0  F4 1 0 0 0    1.918   ?
3  Cr.F2       Cr1 1 0 0 0  F2 1 0 0 0    1.848   ?
4  Cr.F5       Cr1 1 0 0 0  F5 1 0 0 0    1.848   ?
5  Cr.F3.1     Cr1 1 0 0 0  F3 1 0 0 0    1.940   ?
6  Cr.F3.2     Cr1 1 0 0 0  F3 3 0 0 0    1.940   link
7  Ca.F1.1     Ca1 1 0 0 0  F1 5 0 0 0    2.391   link
8  Ca.F1.2     Ca1 1 0 0 0  F1 6 0 0 0    2.292   link
9  Ca.F4.1     Ca1 1 0 0 0  F4 5 0 -1 0   2.391   link
10 Ca.F4.2     Ca1 1 0 0 0  F4 6 0 -1 0   2.292   link
11 Ca.F2       Ca1 1 0 0 0  F5 1 0 0 0    2.215   ?
12 Ca.F5       Ca1 1 0 0 0  F2 4 1 0 0    2.215   link
13 Ca.F3       Ca1 1 0 0 0  F3 5 0 0 0    2.494   link

# End of sample CIFs
_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem

[Send comment to list secretary]
[Reply to list (subscribers only)]