[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
Discussion #2
- To: corecifchem@iucr.org
- Subject: Discussion #2
- From: David Brown <idbrown@mcmaster.ca>
- Date: Mon, 01 Dec 2003 15:53:23 -0500
Dear Colleagues, You will have received the report on the NIST workshop on chemistry XMLs that I recently circulated. This email contains a further discussion paper designed to build on the information I picked up at NIST, and on the comments stimulated by discussion paper #1. Please respond by replying to the whole discussion group by Dec 31. CIF is designed to report crystallographic information, something which it does well. It also includes some (not much) specifically chemical information (e.g. authors have to decide whether the distance between two atoms is a bond or a contact, and provision is made for including bond graphs, i.e. a 2D structures). There are also some items in which the chemical and crystallographic concepts are inextricably mixed. For example, in an earlier email I wrote: 'The entity defined as an atomic site in CIF is strictly an abstract point in the unit cell which is identified by its coordinates. Such points have crystallographic attributes such as symmetry, but we have also decorated them with chemical attributes such as the names of atoms that occupy the site. This leads people to take the category name ATOM_SITE literally as being a list of atoms that define some number of molecular units. Indeed, for Z' = 1 structures many people will transform the crystallographic positional coordinates to orthogonal coordinates and treat the molecule as if the crystal never existed. We have been forced to include 'dummy atoms' in ATOM_SITE in order to define points having no chemical significance. We have never faced up to the problem that CIF describes only the crystallography, not the chemistry.' Subsequent to my writing this, it was pointed out that ATOM_SITE does indeed define the chemistry, in that it defines the 3D structure. It therefore combines both crystallographic and chemical concepts, and separating out these two threads and making sure they are treated logically within their own strands is one of our tasks. This problem was brought to a head by the innocent request of CCDC for a way to include Z' in CIF but Howard Flack (HDF)wrote: ' There are two HUGE problems with this [proposed] definition. ... (1) There is not in fact a single Z' which describes what they are trying to express. Take the example of the compound AB composed of two molecules A and B in the ratio 1:1 as represented by the chemical formula AB. The structure is in space group P2. Molecules A have point symmetry 2 and sit on a 2-fold axis in the crystal structure. There are two independent A molecules in the asymmetric unit. Molecules B have symmetry 1 and sit in a general position. There is one independent molecule B in the asymmetric unit. What is the value of your Z'? (2) And, surprise surprise coming from HDF, there are problems of chirality with this Z'....' Carol Brock identifies a second problem we face when she writes: 'A problem that stands out is the difficulty of developing a set of descriptors that apply well to all kinds of structures. Molecular structures, and especially organic structures, are easier (even if they contain ions) because there is almost always a clear distinction between intra- and intermolecular distances. Similar distinctions can almost always be made for organic macromolecules. There are problems for some coordination complexes and polymers that have metal-atom contacts that some people would describe as bonding and others would describe as nonbonding. Pure inorganics and intermetallics have their own complications.' HDF, coming at this problem from a different direction, expressed shock at the thought that the chemists have no unique way of defining a molecule. From what I learned at the workshop, chemists not only have no unique way of defining a molecule, they don't even care! Most of the ontologies (dictionaries) being developed in chemistry are, like CIF, closely related to an experimental technique where the question of defining a molecule is not important. Peter Murray-Rust's CML defines a molecule as a composed of atoms, but leaves it to the author to state which atoms. Miloslav Nic in Prague is developing GTML (Graph Theory Mark-up Language) which can be used for molecular descriptions, but this also assumes that the molecular graph is already known. mmCIF includes chemical descriptions but these are fairly specific to the kind of macromolecules found in biological systems, i.e., polymers composed of a limited number of different monomeric units such as aminoacids. Provision is made for the definition of the structures of the monomers and the ways in which these are linked to form the polymeric macromolecules. We are unlikely to find much help in these definitions. HDF recently wrote: 'I wonder also whether we should, and have the courage to, embark on representing information on supramolecules, which I think are probably molecules made out of molecules. It all sounds too awful to be true. Even a standard hexa-coordinated Co complex might be encoded as each of the individual ligands as a 'molecule' and the whole complex as another 'molecule'. In the rest of this email I present a possible way to resolve some of these difficulties using a description based on graph theory, since the familiar 2-dimensional molecular structure diagram is what the mathematicians call a graph. They define a graph as a set of vertices (atoms) that are linked by a number of edges (bonds). The selection of the atoms that form the set of vertices does not present a problem because everyone agrees on what atoms are present in a given compound, but not everyone will agree on which atoms should be connected by bonds. Graph theory does not solve this problem, but it does help to distinguish between the chemical properties of the atoms and bonds on the one hand and the mathematical properties of the graph on the other. Chemical properties can be assigned to the atoms and bonds in a graph. This requires a chemical interpretation of the structure, but once these properties have been assigned, graph theory can be used to explore the different possible graphs and their properties. The use of graph theory separates out the intrinsically chemical concepts from the graph theoretical description that can be manipulated mathematically. The bond graph represents a chemical interpretation of the 3D geometry, i.e., the geometry tells us which atoms are neighbours but not where the bonds are to be found. The bonds are assigned by applying various rules relating to the chemical properties of the atoms. However, not all bonds are of equal value; some are clearly stronger than others, i.e., they survive many of the physical and chemical treatments we can subject them to such as melting or dissolution in a solvent. Weaker bonds do not survive this treatment. We can thus imagine that each of the possible edges in the graph is associated with a number representing its 'strength'. (The 'strength' would be zero for the edges that could not possibly represent a chemical bond). I will deliberately avoid defining in detail what I mean by 'strength', but qualitatively it represents the number of electron pairs associated with the bond and it obeys (by definition in some treatments) the rule that the sum of the bond 'strengths' received by any atom is equal to the number of valence electrons the atom uses for bonding. (Implicit in this description is the notion that a bond 'strength' is not restricted to integer values). For over a century chemists have struggled to find a tight quantitative definition for bond 'strength' under such names as bond order, bond number, bond valence, electrostatic bond strength, etc., each definition trying to capture the concept in numeric form. All of these definitions are incomplete in one way or another, but in principle they allow us to order the bonds from strongest to weakest. Assuming that we can at least determine this order even if we cannot assign actual numbers to the bond 'strength', our problem then reduces to the question of where to place the cut-off between the bonds that are shown on the graph and those that are omitted. The fashionable supramolecule provides an interesting example. At one level it can be considered as a collection of individual molecules held together by weak intermolecular bonds. In the bond graph generated at this level, each ligand and each coordinated metal atom would be treated as a separate molecule. If however one is interested in the properties of the supramolecule, e.g., how it 'self-assembles' (something that NaCl and other inorganic compounds have been doing since long before 'self-assembly' became the word of the month), then the cut-off would be set much lower and more bonds would be included in the graph. The result would be an infinite graph describing the infinite supramolecule (c.f. diamond and graphite). Lest you worry how one deals with infinite graphs, I will only point out here that in practice this is not a problem. The creation of bond graphs at different levels can be explored by first generating a graph that includes only the strongest bonds, and then step by step adding progressively weaker bonds. This would create a series of graphs at different cut-off levels. A graph containing all the bonds that could possibly be drawn for a given 3-dimensional structure would, of necessity, be infinite, but if only the strongest bonds were included, the graph of most compounds would consist of several finite disconnected subgraphs, some conceivably containing only one atom, e.g., [Na] [H] [CO3]. Adding bonds that are a little weaker would reduce the number of disconnected subgraphs, but they might still all be finite, e.g., [Na] [HCO3]. Adding yet weaker bonds would ultimately result in a graph that was infinite. One interesting graph that can be uniquely defined is the graph obtained by progressively removing weaker bonds from the complete graph until the graph is no longer infinite. This might be called the maximal finite graph. It is not clear whether this would be a useful graph, but it would be unique, and might represent a useful boundary. The distinction between organic and inorganic compounds is treated as a chemical property which is separate from the treatment of the graph. In the bond graph the difference between the two appears in the spectrum of 'strengths' assigned to the bonds: graphs of organic compounds have a clear gap between the strong and the weak bonds, a gap which is missing for the inorganic compounds. However, there is no difference in the way the graph is treated: in both cases bonds are included down to some arbitrarily chosen cut- off. Before we try to define CIF items for particular chemical concepts, we need to have a consensus about the definition of a molecule. I have made some suggestions above, and I would be interested in people's comments. Is graph theory a fruitful way to go or should we take a different approach? What are the problems we might encounter using the approach described above? Please circulate your views by replying to the whole discussion group (use the reply to: option) and let us see if we can develop a consensus. David _______________________________________________ coreCIFchem mailing list coreCIFchem@iucr.org http://scripts.iucr.org/mailman/listinfo/corecifchem
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Follow-Ups:
- Re: Discussion #2 (Howard Flack)
- Re: Discussion #2 (Howard Flack)
- Prev by Date: IUPAC workshop on XML and IChI
- Next by Date: RE: Discussion #2
- Prev by thread: Re: coreCIFchem Discussion #3
- Next by thread: Re: Discussion #2
- Index(es):