[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Send comment to list secretary]
[Reply to list (subscribers only)]
Re: CoreCIFchem Discussion #6
- To: Chemical information in core CIF <corecifchem@iucr.org>
- Subject: Re: CoreCIFchem Discussion #6
- From: Peter Murray-Rust <pm286@cam.ac.uk>
- Date: Wed, 22 Dec 2004 17:18:36 +0000
- In-Reply-To: <41C8515A.3020809@flack.ch>
- References: <4161A9B4.9030306@mcmaster.ca><4161A9B4.9030306@mcmaster.ca>
At 17:37 21/12/2004 +0100, Howard Flack wrote: >Season's Greetings Also greetings, I have been quiet but not completely idle since the last emails... First some responses... and then another mail ><snip/> > >#6L83>CIF. Every list (i.e., loop) in a CIF must have a list-reference > > I think that is not correct. For proof, from #6 >L1844>loop_ >L1845> _geom_bond_atom_site_label_1 >L1846> _geom_bond_atom_site_label_2 >L1847> _geom_bond_distance >L1848> _geom_bond_site_symmetry_1 >L1849> _geom_bond_site_symmetry_2 > and the current cif_core.dic version 2.3, and many cif's on > journals.iucr.org concur. This means that the following data items and > the corresponding data values are NOT required as they serve no purpose. > _tecton_conformer_id > _tecton_geom_dist_id > _tecton_geom_angle_id > _tecton_geom_torsion_id > _map_tecton_atom_map_id > _map_tecton2crystal_atom_id > _map_tecton2crystal_bond_id > > _tecton_topology_bond_id is NOT required in the TNT example but IS > required in the CaCrF5 example. I hope my list is complete and correct. > > > >#6L98>The implications of these two views are brought out clearly in the >examples we >This paragraph reads like just one more battle in the war of >crystallographers versus >IT specialists. Essentially what is designed to be easy to programme is >difficult to use for the crystallographer and what is designed to be easy >to use by the crystallographer is difficult to programme. Certainly we >have to find a middle implementable ground. I agree with this analysis! FWIW I am gently introducing a link/reference syntax into parts of CML. But this is based on the likelihood of having tools that can help with this. Things that looked hard 10 years ago are now easier. A DOM-like approach to CIF will help considerably in resolving links and we can at least help with the Java implementation. Note that use of links comes very close to involving save_ frames. If the community can agree that software will become available to support save_ then much of the current discussion becomes more positive. (A save_ can be represented as an additional node in a DOM and referenceable by id-based addressing) <snip/> >#6L289>(N.B. > >isomers differ at the topological level, conformers have the same > topology but > >differ at the geometry level). >also >#6L375>The topological description does not include any information on the >geometry > >of the tecton but it does distinguish between isomers. > > Here and in quite a few other places in the text we have mention of > molecules, isomers and conformers. This is due to the nature of the TNT > example. However one needs also to be precise on where and how the > following sub-categories of molecules fit in: > (1) enantiomers (coming orginally from 'enantiomorphous isomer') > including enantiomers of known absolute configuration, enantiomers of > unknown or relative configuration and racemates. > (2) diastereoisomers > (1) and (2) above have the same topology but are not considered as > being conformers. > > The question arises as to the best place and method to specify the > chirality of the molecules. I recommend that we do it the way that things > are set up in the IUPAC dictionnary of stereochemistry. Chiral molecules > of unknown or relative configuration and racemates are treated as an > extension of the nomenclature of enantiopure compounds. > A very common case is of chiral molecules containing chiral centres. > Clearly the best place to include the specification of the chirality of > these atom-based centres is either in _tecton_topology_atom_chirality or > in the _tecton_geom_atom_ loop as a data value with name > _tecton_geom_atom_chirality taking a value from one of the following > taken from the IUPAC dictionnary: R or S for an enantiopure enantiomer of > known absolute configuration, R* or S* for an enantiopure enantiomer of > unknown or relative absolute configuration, RS or SR for a racemate. What > should one do if an atom is not a chiral centre i.e. it is achiral? > Clearly one needs a data value meaning 'this atom is not a chiral > centre'. This value does not mean the same as 'chirality unknown'. > Another very common case occurs where a single symbol is used to > indicate the chirality of a molecule with or without chiral centres. The > ones one sees all the time are D or L for carbohydrates and amino acids. > There are others as well. [HDF should make a list of possible values]. > These indications of chirality go naturally in _tecton_chirality and > _tecton_conformer_chirality as values D, L, DL, rac, rel, > {traditionalists might like to add +, - and +or- which personally I > detest totally}, and maybe some others for values like 'enantiopure', > 'unknown' [HDF needs to think more about this]. > The information that is currently coded in > _chemical_absolute_configuration needs to be included in the _tecton_* > and _tecton_conformer _loop_s. _chemical_absolute_configuration needs to > be deprecated. > [Need to think more about the racemate because this may still be a > problem because the IUPAC stereochemistry dictionnary insists on having R > as the first CIP symbol. The crystallographer may not have chosen the > opposite enantiomer in the asymmetric unit. Also the chemical diagram > needs carefull attention. I think there are specific ways of drawing a > chemical diagram to indicate a racemate rather than just one of its > enantiopure components.] > >[NB to PMR: As you are a member of the IUPAC stereochemistry committee, >I'm depending on you informing us of any relevant proposed changes to the >nomenclature.] Indeed. The group is primarily working on structure representation and the stereochemistry is as annotations to diagrams rather than as nomenclature as such (actually that is probably at least as valuable). Also the InChI is tackling the computer formalisation of stereochemistry in a systematic and responsible way. So I think IUCr will benefit greatly from following these two efforts. >#6L355>and conformation of a the tectons to be specified > and to: > and conformation of the tectons to be specified > > > >#6L467>illustration, the molecule contains a crystallographic mirror plane >that > should better be: > illustration, the average disordered molecule contains a > crystallographic mirror plane that > > > >#6L471>numbers of 0.5. Because of the disorder the crystallographic >structure does > should better be: >numbers of 0.5. Because of the disorder the average crystallographic >structure does > > > >#6L483># The first set of loops define the topology of the TNT molecule > should be: ># The first set of loops defines the topology of the TNT molecule > > > >#6L488># If a crystal contained molecules of more than one compound, or >more than one > ># isomer of a compound, each would be described by a separate tecton. > ># If the crystal contained more than one copy of the same molecule in the > ># asymmetric unit (Z'>1) the topology of the tecton would be given only once > ># but it would be mapped onto all the crystallographically distinct > copies. > should better be ># If a crystal contains different types of molecules (isomers or >diastereoisomers or enantiomers other than the racemate) ># each would be described by a separate tecton. ># If the crystal contained more than one copy of the same molecule in the ># asymmetric unit (Z'>1) the topology of the tecton would be given only once ># and then this single topology would be mapped onto all the >crystallographically distinct copies. > > > >#6L511># together their properties. We may wish to define other >properties, such as > should better be: ># together with their properties. We may wish to define other properties, >such as > > > >#6L526># _tecton_chirality # time-averaged if no geometry given > ADD after this line ># _atomSet_absolute_configuration # defined like current >_chemical_absolute_configuration > > > >#6L582># The CIF dictionary already contains instructions for drawing a >2-D molecular > ># diagram in the group of chemical_conn categories. Although the > ># chemical_conn categories also describe the topology of a molecule they are > > Do I understand correctly that you are thinking of deprecating > chemical_conn items? > > > >#6L627># I have added an item _tecton_topology_atom_chirality which is not >needed in ># this example, but is needed in chiral structures to identify any atom that ># serves as a chiral center. Chirality is not captured by the topology, but ># it is, like topology, a feature of the structure that can only be changed by ># breaking and making bonds. It is included here because it is more closely ># related to the topology than to the geometry which can be changed without ># breaking any bonds. I will defer to others what values should be associated ># with this item - presumably some letter like R or S. > > I don't at all like this discussion about breaking bonds etc. The real > reason for me that one needs _chirality here is for the case where there > is only one 'conformer' and you don't want to give any geometry > information. Thus it turns out to be convenient to give it here. If you > have several 'conformers' you give the chirality information in > _tecton_atom_chirality. The possible values are given above. > It sounds here as if there is a need to map atoms from one domain to another - e.g. topology (2d) to geometry (3d). This is not generally not well supported in traditional chemistry programs which tend to concentrate on either the topology or the 3D structure. CML supports both (and fractionals as well). Thus if you have one instance of a molecule it can be represented as "8D" - conn+cartesian+fract without loss of information. You can even add hydrogens atoms for which the 2D information but not the 3D is known. The problem comes when you have more than one instance of a molecule - disorder, conformations, dynamics snapshots, etc. for which the 2D structure is the same and for which the 3D structure changes. I don't think anyone has yet created a happy representation that does not use implicit semantics (e.g. atom order assumes identity). We have been wrestling with this in CML and have a prototype design where 2D information is described once and then instances of the different 3D sets are described with just the relevant changed information. Everything is linked through persistent atom_ids. Something like: _atom_site_label _atom_site_occupancy O1 1.0 N1 1.0 and then (say) disordered groups _atom_site_label_ref _atom_site_fract_x _atom_site_fract_y _atom_site_fract_z O1 0.1 0.2 0.3 N1 0.2 0.3 0.4 _atom_site_label_ref _atom_site_fract_x _atom_site_fract_y _atom_site_fract_z O1 0.12 0.21 0.35 N1 0.23 0.31 0.43 This is invalid CIF as names cannot be repeated but shows two groups which reference a "parent" group with the unchanging information (here the occupancy). Linking is (here) through the label. The two groups have different fractional coordinates so that it would be possible to instantiate both of them. For chemical concepts the charge, hydrogen count, etc. will be inherited by the "children" >#6L714>loop_ >_tecton_topology_bond_id >_tecton_topology_bond_atom1_id # Child of _tecton_topology_atom_id >_tecton_topology_bond_atom2_id # Child of _tecton_topology_atom_id >_tecton_topology_bond_type >1 T.C1 T.C2 arom # TNT benzene ring >2 T.C2 T.C3 arom >3 T.C3 T.C4 arom >4 T.C4 T.C5 arom >5 T.C5 T.C6 arom >6 T.C6 T.C1 arom >7 T.C3 T.H3 sing >8 T.C5 T.H5 sing > etc, etc > > should be: > >loop_ >_atomSet_topology_bond_atom1_id # Child of _atomSet_topology_atom_id >_atomSet_topology_bond_atom2_id # Child of _atomSet_topology_atom_id >_atomSet_topology_bond_type >T.C1 T.C2 arom # TNT benzene ring >T.C2 T.C3 arom >T.C3 T.C4 arom >T.C4 T.C5 arom >T.C5 T.C6 arom >T.C6 T.C1 arom >T.C3 T.H3 sing >T.C5 T.H5 sing > etc, etc > as the _tecton_topology_bond_id serves no purpose. > > > >#6L834>loop_ >_tecton_conformer_id # List-reference >_tecton_conformer_tecton_id # Child of _tecton_topology_id >_tecton_conformer_point_group # Schoenflies point group symbol of conformer >_tecton_conformer_chirality # We need to define allowed symbols >_tecton_conformer_details > > should be > >loop_ >_atomSet_conformer_id # List-reference >_atomSet_conformer_tecton_id # Child of _tecton_topology_id >_atomSet_conformer_point_group # Schoenflies point group symbol of conformer >_atomSet_conformer_chirality # We need to define allowed symbols >_atomSet_conformer_absolute_configuration # values as per >_chemical_absolute_configuration >_atomSet_conformer_Zprime # >_atomSet_conformer_occupation # occupation number of this conformer in >the crystal >_atomSet_conformer_details > > With several 'conformers' you need these additional values to be given here. > I do not agree at all with David's proposal of putting copies of the > occupation number with the individual atom information of the conformer. > The occupation applies to the whole conformer and must go here. Putting > occupation values on the individual atoms leaves the gate wide open for > those who might be tempted to 'doctor' their results to get around an > error message from a checking programme. FWIW the disorder reported in the CIFs I have seen recently seems to be well organised. Each disordered group should have a constant occupancy for all its atoms - the question is whether it should be normalised onto the disorder group/conformer or whether it should be spelt out for each atom (denormalized). Much comes down to how the software is written. If there is general agreement that substructures must be supported then the normalized approach is acceptable - if not it is possible that the occupancy could get "lost" - i.e. assumed to be 1.0. >#6L893>loop_ >_tecton_geom_atom_id # List-reference, child of _tecton_topology_atom_id >_tecton_geom_atom_conformer_label # Child of _tecton_conformer_equiv_label >_tecton_geom_atom_coord_x # Coordinates of atom in Angstrom >_tecton_geom_atom_coord_y # >_tecton_geom_atom_coord_z # >_tecton_geom_atom_details > > should be > >_atomSet_geom_atom_id # List-reference, child of _tecton_topology_atom_id >_atomSet_geom_atom_conformer_label # Child of _tecton_conformer_equiv_label >_atomSet_geom_atom_coord_x # Coordinates of atom in Angstrom >_atomSet_geom_atom_coord_y # >_atomSet_geom_atom_coord_z # >_atomSet_geom_atom_chirality # chirality of chiral centre on atom >as per CIP >_atomSet_geom_atom_details Forgive me if I have already discussed CIP. CIP is primarily for humans to communicate stereochemistry to other humans who - for some reason - cannot see a structural diagram. It is also a necessary part of the formal name as used by patent lawyers. It is very difficult for a machine to understand and there are chiral compounds for which CIP is too complicated (i.e. the algorithm doesn't terminate). Atom parity is designed for machines to understand chirality and works by labelling the 4 (or 3) ligands of a chiral atom. As long as the labels are explicit the algorithm is relatively simple (CML uses the one proposed for MIF). Pure conformers do not normally differ in atom chirality. This normally only happens with restricted rotation as in biphenyls. If there are conformers of differing atomic parities then they should really be enantiomers or diastereomers. (I note this is addressed below) >#6L919>loop_ >_tecton_geom_dist_id # List-reference >_tecton_geom_dist_conformer_label # Child of _tecton_geom_equiv_label >_tecton_geom_dist_atom1_id # Child of _tecton_topology_atom_id >_tecton_geom_dist_atom2_id # Child of _tecton_topology_atom_id >_tecton_geom_dist_distance # Distance atom1-atom2 in Angstroms >1 all T.C7 T.C1 1.54 # TNT methyl group >2 all T.C7 T.H71 1.05 >3 all T.C7 T.H72 1.05 >4 all T.C7 T.H73 1.05 >5 all T.N4 T.C4 1.43 # TNT N4 nitro group >6 all T.N4 T.O41 1.18 >7 all T.N4 T.O42 1.18 >8 all T.N2 T.C2 1.43 # TNT N2 nitro group >9 all T.N2 T.O21 1.18 >10 all T.N2 T.O22 1.18 >11 all T.N6 T.C6 1.43 # TNT N6 nitro group >12 all T.N6 T.O61 1.18 >13 all T.N6 T.O62 1.18 > > should be as follows since _dist_id serves no purpose. > >loop_ >_atomSet_geom_dist_conformer_label # Child of _tecton_geom_equiv_label >_atomSet_geom_dist_atom1_id # Child of _tecton_topology_atom_id >_atomSet_geom_dist_atom2_id # Child of _tecton_topology_atom_id >_atomSet_geom_dist_distance # Distance atom1-atom2 in Angstroms >all T.C7 T.C1 1.54 # TNT methyl group >all T.C7 T.H71 1.05 >all T.C7 T.H72 1.05 >all T.C7 T.H73 1.05 >all T.N4 T.C4 1.43 # TNT N4 nitro group >all T.N4 T.O41 1.18 >all T.N4 T.O42 1.18 >all T.N2 T.C2 1.43 # TNT N2 nitro group >all T.N2 T.O21 1.18 >all T.N2 T.O22 1.18 >all T.N6 T.C6 1.43 # TNT N6 nitro group >all T.N6 T.O61 1.18 >all T.N6 T.O62 1.18 > > > >#6L942>loop_ >_tecton_geom_angle_id # List-reference >_tecton_geom_angle_conformer_label # Child of _tecton_geom_equiv_label >_tecton_geom_angle_atom1_id # Child of _tecton_topology_atom_id >_tecton_geom_angle_atom2_id # Child of _tecton_topology_atom_id >_tecton_geom_angle_atom3_is # Child of _tecton_topology_atom_id >_tecton_geom_angle_angle # Angle in degrees >1 all T.C1 T.C7 T.H71 109 # TNT Methyl group >2 all T.C1 T.C7 T.H72 109 >3 all T.C1 T.C7 T.H73 109 >4 all T.H71 T.C7 T.H72 109 >5 all T.H72 T.C7 T.H73 109 >6 all T.H73 T.C7 T.H71 109 >7 all T.O41 T.N4 T.C4 117 # TNT N4 nitro group >8 all T.O42 T.N4 T.C4 117 >9 all T.O41 T.N4 T.O42 126 >10 all T.O21 T.N2 T.C2 117 # TNT N2 nitro group >11 all T.O22 T.N2 T.C2 117 >12 all T.O21 T.N2 T.O22 126 >13 all T.O61 T.N6 T.C6 117 # TNT N6 nitro group >14 all T.O62 T.N6 T.C6 117 >15 all T.O61 T.N6 T.O62 126 > > should be as _angle_id serves no purpose > >loop_ >_atomSet_geom_angle_conformer_label # Child of _tecton_geom_equiv_label >_atomSet_geom_angle_atom1_id # Child of _tecton_topology_atom_id >_atomSet_geom_angle_atom2_id # Child of _tecton_topology_atom_id >_atomSet_geom_angle_atom3_is # Child of _tecton_topology_atom_id >_atomSet_geom_angle_angle # Angle in degrees >all T.C1 T.C7 T.H71 109 # TNT Methyl group >all T.C1 T.C7 T.H72 109 >all T.C1 T.C7 T.H73 109 >all T.H71 T.C7 T.H72 109 >all T.H72 T.C7 T.H73 109 >all T.H73 T.C7 T.H71 109 >all T.O41 T.N4 T.C4 117 # TNT N4 nitro group >all T.O42 T.N4 T.C4 117 >all T.O41 T.N4 T.O42 126 >all T.O21 T.N2 T.C2 117 # TNT N2 nitro group >all T.O22 T.N2 T.C2 117 >all T.O21 T.N2 T.O22 126 >all T.O61 T.N6 T.C6 117 # TNT N6 nitro group >all T.O62 T.N6 T.C6 117 >all T.O61 T.N6 T.O62 126 > > > >#6L975>loop_ >_tecton_geom_torsion_id # List-reference >_tecton_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label >_tecton_geom_torsion_atom1_id # Child of _tecton_topology_atom_id >_tecton_geom_torsion_atom2_id # Child of _tecton_topology_atom_id >_tecton_geom_torsion_atom3_id # Child of _tecton_topology_atom_id >_tecton_geom_torsion_atom4_id # Child of _tecton_topology_atom_id >_tecton_geom_torsion_angle # Torsion angle in degrees >1 all T.C3 T.C4 T.N4 T.O41 90 >2 aa T.C1 T.C2 T.N2 T.O21 10.5 >3 aa T.C1 T.C6 T.N6 T.O61 10.5 >4 bb T.C1 T.C2 T.N2 T.O21 -10.5 >5 bb T.C1 T.C6 T.N6 T.O61 -10.5 >6 ab T.C1 T.C2 T.N2 T.O21 10.5 >7 ab T.C1 T.C6 T.N6 T.O61 -10.5 >8 ba T.C1 T.C2 T.N2 T.O21 -10.5 >9 ba T.C1 T.C6 T.N6 T.O61 10.5 > >should be since _torsion_id serves no purpose > >loop_ >_atomSet_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label >_atomSet_geom_torsion_atom1_id # Child of _tecton_topology_atom_id >_atomSet_geom_torsion_atom2_id # Child of _tecton_topology_atom_id >_atomSet_geom_torsion_atom3_id # Child of _tecton_topology_atom_id >_atomSet_geom_torsion_atom4_id # Child of _tecton_topology_atom_id >_atomSet_geom_torsion_angle # Torsion angle in degrees >all T.C3 T.C4 T.N4 T.O41 90 >aa T.C1 T.C2 T.N2 T.O21 10.5 >aa T.C1 T.C6 T.N6 T.O61 10.5 >bb T.C1 T.C2 T.N2 T.O21 -10.5 >bb T.C1 T.C6 T.N6 T.O61 -10.5 >ab T.C1 T.C2 T.N2 T.O21 10.5 >ab T.C1 T.C6 T.N6 T.O61 -10.5 >ba T.C1 T.C2 T.N2 T.O21 -10.5 >ba T.C1 T.C6 T.N6 T.O61 10.5 > > > >#6L1045>loop_ >_map_tecton_atom_map_id # List reference >_map_tecton_atom_atom1_id # Child of _tecton_topology_atom_id >_map_tecton_atom_atom2_id # Child of _tecton_topology_atom_id >1 B.C1 T.C1 # mapping 1,2,4,6 benzene moiety onto TNT >2 B.C2 T.C2 >3 B.C3 T.C3 >etc etc > >should be since _map_id serves no purpose > >loop_ >_map_atomSet_atom_atom1_id # Child of _tecton_topology_atom_id >_map_atomSet_atom_atom2_id # Child of _tecton_topology_atom_id >B.C1 T.C1 # mapping 1,2,4,6 benzene moiety onto TNT >B.C2 T.C2 >B.C3 T.C3 >etc etc > > > >#6L1085># The occupation number indicates how much of each conformer (or >isomer) is ># present. The occupation numbers of the atoms in the crystal are defined in ># the atom_site loop and must not be less than the sum of the corresponding ># occupation numbers of the conformers. > > should be > ># The occupation number indicates how much of each conformer (or isomer) is ># present. The occupation numbers of the atoms in the crystal are defined in ># the atom_site loop and must approximately equal within 2 or 3 standard ># uncertainties to the sum of the corresponding occupation numbers of the >conformers. > This seems practicable. I have found that there is good agreement between the sum of occupancies and overall formulae. >#6L1118>loop_ >_map_tecton2crystal_atom_id # List-reference >_map_tecton2crystal_atom_atom_id # Child of _tecton_topology_atom_id >_map_tecton2crystal_atom_conformer_label > # Child of _tecton_conformer_equiv_label >_map_tecton2crystal_atom_occup_number # Occupation number of tecton atom >_map_tecton2crystal_atom_atom_site_label # child of _atom_site_label >_map_tecton2crystal_atom_symop_id # child of _space_group_symop_id >1 T.C1 all 1 C1 1 >2 T.C2 all 1 C2 1 >3 T.C3 all 1 C3 1 >4 T.C4 all 1 C4 1 >5 T.C5 all 1 C3 2 >6 T.C6 all 1 C2 2 >7 T.H3 all 1 H3 1 >8 T.H5 all 1 H3 2 >9 T.C7 all 1 C7 1 >10 T.H71 all 1 H71 1 >11 T.H72 all 1 H72 1 >12 T.H73 all 1 H71 2 >13 T.N4 all 1 N4 1 >14 T.O41 all 1 O41 1 >15 T.O42 all 1 O42 1 ># SIDE CHAINS >16 T.N2 aa 0.5 N2a 1 >17 T.O21 aa 0.5 O21a 1 >18 T.O22 aa 0.5 O22a 1 >19 T.N6 aa 0.5 N2a 2 >20 T.O61 aa 0.5 O21a 2 >21 T.O62 aa 0.5 O22a 2 >22 T.N2 bb 0.5 N2b 1 >23 T.O21 bb 0.5 O21b 1 >24 T.O22 bb 0.5 O22b 1 >25 T.N6 bb 0.5 N2b 2 >26 T.O61 bb 0.5 O21b 2 >27 T.O62 bb 0.5 O22b 2 > > should be because (a) _atom_id serves no purpose and (b) the > occupations do not belong here > >loop_ >_map_atomSet2crystal_atom_atom_id # Child of _atomSet_topology_atom_id >_map_atomSet2crystal_atom_conformer_label > # Child of _atomSet_conformer_equiv_label >_map_atomSet2crystal_atom_atom_site_label # child of _atom_site_label >_map_atomSet2crystal_atom_symop_id # child of _space_group_symop_id >T.C1 all C1 symop1 >T.C2 all C2 symop1 >T.C3 all C3 symop1 >T.C4 all C4 symop1 >T.C5 all C3 symop2 >T.C6 all C2 symop2 >T.H3 all H3 symop1 >T.H5 all H3 symop2 >T.C7 all C7 symop1 >T.H71 all H71 symop1 >T.H72 all H72 symop1 >T.H73 all H71 symop2 >T.N4 all N4 symop1 >T.O41 all O41 symop1 >T.O42 all O42 symop1 ># SIDE CHAINS >T.N2 aa N2a symop1 >T.O21 aa O21a symop1 >T.O22 aa O22a symop1 >T.N6 aa N2a symop2 >T.O61 aa O21a symop2 >T.O62 aa O22a symop2 >T.N2 bb N2b symop1 >T.O21 bb O21b symop1 >T.O22 bb O22b symop1 >T.N6 bb N2b symop2 >T.O61 bb O21b symop2 >T.O62 bb O22b symop2 > > > >#6L1163> 4.2 SECOND SAMPLE CIF > 4.2 should be 3.2 > > > >#6L1240>loop_ > _space_group_symop_id > _space_group_symop_operation_xyz >1 ' X, Y, Z' >2 '-X, Y,-Z+1/2' >3 '-X,-Y,-Z' >4 ' X,-Y, Z+1/2' >5 ' X+1/2, Y+1/2, Z' >6 '-X+1/2, Y+1/2,-Z+1/2' >7 '-X+1/2,-Y+1/2,-Z' >8 ' X+1/2,-Y+1/2, Z+1/2' > > would be nicer as > >loop_ > _space_group_symop_id > _space_group_symop_operation_xyz >symop1 ' X, Y, Z' >symop2 '-X, Y,-Z+1/2' >symop3 '-X,-Y,-Z' >symop4 ' X,-Y, Z+1/2' >symop5 ' X+1/2, Y+1/2, Z' >symop6 '-X+1/2, Y+1/2,-Z+1/2' >symop7 '-X+1/2,-Y+1/2,-Z' >symop8 ' X+1/2,-Y+1/2, Z+1/2' I agree. We have relied heavily on the identification of symops in building chemistry from CIFs. >#6L1291>loop_ >_tecton_topology_id # List reference >_tecton_topology_formula >_tecton_topology_special_details >1 'Ca Cr F5' 'The formula unit' > > would be nicer as: > >loop_ >_atomSet_topology_id # List reference >_atomSet_topology_formula >_atomSet_topology_special_details >atomSet1 'Ca Cr F5' 'The formula unit' > > > >#6L1309># _tecton_topology_atom_label is included for the benefit of the >user. It has ># no parent or child and is not required for CIF management. The CIF ># identifies the atom by _tecton_topology_atom_id. > > I don't see what possible benefit this _atom_label is for the user. In > fact I think things are clearer if you leave it out. > > > >#6L1318>loop_ >_tecton_topology_atom_id # List-reference >_tecton_topology_atom_tecton_id # Child of _tecton_topology_id >_tecton_topology_atom_label >_tecton_topology_atom_type_symbol # Child of _atom_type_symbol >_tecton_topology_atom_valence >_tecton_topology_atom_coord_number # Number of bonds formed by this atom >_tecton_topology_atom_details >Ca 1 Ca1 Ca 2 7 ? >Cr 1 Cr1 Cr 3 6 ? >F1 1 F1 F -1 3 ? >F2 1 F2 F -1 2 ? >F3 1 F3 F -1 3 ? >F4 1 F4 F -1 3 ' Related to F1 by crystallographic symmetry' >F5 1 F5 F -1 2 ' Related to F2 by crystallographic symmetry' > >looks nicer as > >loop_ >_atomSet_topology_atom_id # List-reference >_atomSet_topology_atom_tecton_id # Child of >_tecton_topology_id >_atomSet_topology_atom_type_symbol # Child of _atom_type_symbol >_atomSet_topology_atom_valence >_atomSet_topology_atom_coord_number # Number of bonds formed by this atom >_atomSet_topology_atom_details >Ca atomSet1 Ca 2 7 ? >Cr atomSet1 Cr 3 6 ? >F1 atomSet1 F -1 3 ? >F2 atomSet1 F -1 2 ? >F3 atomSet1 F -1 3 ? >F4 atomSet1 F -1 3 ' Related to F1 by crystallographic symmetry' >F5 atomSet1 F -1 2 ' Related to F2 by crystallographic symmetry' > > > >#6L1393>## the finite bond graph, i.e. that atoms in the tecton from which the > should be >## the finite bond graph, i.e. those atoms in the tecton from which the > > > >#6L1403>loop_ >_tecton_geom_dist_id # List-reference >_tecton_geom_dist_atom1_id # Child of _tecton_topology_atom_id >_tecton_geom_dist_atom2_id # Child of _tecton_topology_atom_id >_tecton_geom dist_distance # Ideal bond distance in Angstroms >_tecton_geom_dist_valence # Same as _tecton_topology_bond_valence >_tecton_geom_dist_details >A Cr F1 1.93 0.48 'Bond distances calculated from bond valences' >B Cr F4 1.93 0.48 'Bond distances calculated from bond valences' >C Cr F2 1.84 0.61 'Bond distances calculated from bond valences' >D Cr F5 1.84 0.61 'Bond distances calculated from bond valences' >E Cr F3 1.99 0.41 'Bond distances calculated from bond valences' >F Cr F3 1.99 0.41 'Bond distances calculated from bond valences' >G Ca F1 2.34 0.26 'Bond distances calculated from bond valences' >H Ca F1 2.34 0.26 'Bond distances calculated from bond valences' >I Ca F4 2.34 0.26 'Bond distances calculated from bond valences' >J Ca F4 2.34 0.26 'Bond distances calculated from bond valences' >K Ca F2 2.19 0.39 'Bond distances calculated from bond valences' >L Ca F5 2.19 0.39 'Bond distances calculated from bond valences' >M Ca F3 2.48 0.18 'Bond distances calculated from bond valences' > > since _dist_id serves no purpose should be > >loop_ >_atomSet_geom_dist_atom1_id # Child of _atomSet_topology_atom_id >_atomSet_geom_dist_atom2_id # Child of _atomSet_topology_atom_id >_atomSet_geom dist_distance # Ideal bond distance in Angstroms >_atomSet_geom_dist_valence # Same as _atomSet_topology_bond_valence >_atomSet_geom_dist_details >Cr F1 1.93 0.48 'Bond distances calculated from bond valences' >Cr F4 1.93 0.48 'Bond distances calculated from bond valences' >Cr F2 1.84 0.61 'Bond distances calculated from bond valences' >Cr F5 1.84 0.61 'Bond distances calculated from bond valences' >Cr F3 1.99 0.41 'Bond distances calculated from bond valences' >Cr F3 1.99 0.41 'Bond distances calculated from bond valences' >Ca F1 2.34 0.26 'Bond distances calculated from bond valences' >Ca F1 2.34 0.26 'Bond distances calculated from bond valences' >Ca F4 2.34 0.26 'Bond distances calculated from bond valences' >Ca F4 2.34 0.26 'Bond distances calculated from bond valences' >Ca F2 2.19 0.39 'Bond distances calculated from bond valences' >Ca F5 2.19 0.39 'Bond distances calculated from bond valences' >Ca F3 2.48 0.18 'Bond distances calculated from bond valences' > > >#6L1434># Note that atoms F4 and F5 in the molecular unit map onto > should be ># Note that atoms F4 and F5 in the atomSet map onto > > > >#6L1441>loop_ >_map_tecton2crystal_atom_id # List reference >_map_tecton2crystal_atom_atom_id # Child of _tecton_topology_atom_id >_map_tecton2crystal_atom_atom_site_label # Child of _atom_site_label >_map_tecton2crystal_atom_symop_id # Child of _space_group_symop_id >_map_tecton2crystal_atom_trans_x >_map_tecton2crystal_atom_trans_y >_map_tecton2crystal_atom_trans_z >1 Ca Ca1 1 0 0 0 >2 Cr Cr1 1 0 0 0 >3 F1 F1 1 0 0 0 >4 F2 F2 1 0 0 0 >5 F3 F3 1 0 0 0 >6 F4 F1 3 0 0 0 >7 F5 F2 3 0 0 0 > > should be nicer as: > >loop_ >_map_atomSet2crystal_atom_atom_id # Child of _atomSet_topology_atom_id >_map_atomSet2crystal_atom_atom_site_label # Child of _atom_site_label >_map_atomSet2crystal_atom_symop_id # Child of _space_group_symop_id >_map_atomSet2crystal_atom_trans_x >_map_atomSet2crystal_atom_trans_y >_map_atomSet2crystal_atom_trans_z >Ca Ca1 symop1 0 0 0 >Cr Cr1 symop1 0 0 0 >F1 F1 symop1 0 0 0 >F2 F2 symop1 0 0 0 >F3 F3 symop1 0 0 0 >F4 F1 symop3 0 0 0 >F5 F2 symop3 0 0 0 > > > >#6L1477>loop_ >_map_tecton2crystal_bond_id # List reference >_map_tecton2crystal_bond_bond_id # Child of _tecton_topology_bond_id >_map_tecton2crystal_bond_atom_site_label_1 # Child of _atom_site_label >_map_tecton2crystal_bond_symop_1 # Child of _space_group_symop_id >_map_tecton2crystal_bond_trans_x_1 >_map_tecton2crystal_bond_trans_y_1 >_map_tecton2crystal_bond_trans_z_1 >_map_tecton2crystal_bond_atom_site_label_2 # Child of _atom_site_label >_map_tecton2crystal_bond_symop_2 # Child of _space_group_symop_id >_map_tecton2crystal_bond_trans_x_2 >_map_tecton2crystal_bond_trans_y_2 >_map_tecton2crystal_bond_trans_z_2 >_map_tecton2crystal_bond_dist # Observed distance (optional) >_map_tecton2crystal_bond_details >1 Cr.F1 Cr1 1 0 0 0 F1 1 0 0 0 1.918 ? >2 Cr.F4 Cr1 1 0 0 0 F4 1 0 0 0 1.918 ? >3 Cr.F2 Cr1 1 0 0 0 F2 1 0 0 0 1.848 ? >4 Cr.F5 Cr1 1 0 0 0 F5 1 0 0 0 1.848 ? >5 Cr.F3.1 Cr1 1 0 0 0 F3 1 0 0 0 1.940 ? >6 Cr.F3.2 Cr1 1 0 0 0 F3 3 0 0 0 1.940 link > >7 Ca.F1.1 Ca1 1 0 0 0 F1 5 0 0 0 2.391 link >8 Ca.F1.2 Ca1 1 0 0 0 F1 6 0 0 0 2.292 link >9 Ca.F4.1 Ca1 1 0 0 0 F4 5 0 -1 0 2.391 link >10 Ca.F4.2 Ca1 1 0 0 0 F4 6 0 -1 0 2.292 link >11 Ca.F2 Ca1 1 0 0 0 F5 1 0 0 0 2.215 ? >12 Ca.F5 Ca1 1 0 0 0 F2 4 1 0 0 2.215 link >13 Ca.F3 Ca1 1 0 0 0 F3 5 0 0 0 2.494 link > > would be nicer as: > >loop_ >_map_atomSet2crystal_bond_bond_id # Child of >_tecton_topology_bond_id >_map_atomSet2crystal_bond_atom_site_label_1 # Child of _atom_site_label >_map_atomSet2crystal_bond_symop_1 # Child of _space_group_symop_id >_map_atomSet2crystal_bond_trans_x_1 >_map_atomSet2crystal_bond_trans_y_1 >_map_atomSet2crystal_bond_trans_z_1 >_map_atomSet2crystal_bond_atom_site_label_2 # Child of _atom_site_label >_map_atomSet2crystal_bond_symop_2 # Child of _space_group_symop_id >_map_atomSet2crystal_bond_trans_x_2 >_map_atomSet2crystal_bond_trans_y_2 >_map_atomSet2crystal_bond_trans_z_2 >_map_atomSet2crystal_bond_dist # Observed distance (optional) >_map_atomSet2crystal_bond_details >Cr.F1 Cr1 symop1 0 0 0 F1 symop1 0 0 0 1.918 ? >Cr.F4 Cr1 symop1 0 0 0 F4 symop1 0 0 0 1.918 ? >Cr.F2 Cr1 symop1 0 0 0 F2 symop1 0 0 0 1.848 ? >Cr.F5 Cr1 symop1 0 0 0 F5 symop1 0 0 0 1.848 ? >Cr.F3.1 Cr1 symop1 0 0 0 F3 symop1 0 0 0 1.940 ? >Cr.F3.2 Cr1 symop1 0 0 0 F3 symop3 0 0 0 1.940 link >Ca.F1.1 Ca1 symop1 0 0 0 F1 symop5 0 0 0 2.391 link >Ca.F1.2 Ca1 symop1 0 0 0 F1 symop6 0 0 0 2.292 link >Ca.F4.1 Ca1 symop1 0 0 0 F4 symop5 0 -1 0 2.391 link >Ca.F4.2 Ca1 symop1 0 0 0 F4 symop6 0 -1 0 2.292 link >Ca.F2 Ca1 symop1 0 0 0 F5 symop1 0 0 0 2.215 ? >Ca.F5 Ca1 symop1 0 0 0 F2 symop4 1 0 0 2.215 link >Ca.F3 Ca1 symop1 0 0 0 F3 symop5 0 0 0 2.494 link > > > >#6L1545> 5. SAMPLE CIFS WITH COMMENTS REMOVED > Parts of section 5 are already out of date with respect to the content > of section 3 > > > > >PMR>One of the most important contributions would be to require that EVERY >atom is reported. > > Yes I agree with that. > >PMR>Are conformers only relevant for disordered structures or might a >species such as TNT have one NO2 tecton with three conformations (I would >argue against that) > > In my view the word 'conformer' is badly chosen. One topology might > well correspond to the two opposite enantiomers and perhaps several > diastereoisomers. Agreed fully. >PMR>It would be useful to see an example of a simple structure without >problems, and perhaps one without disorder but either symmetry or multiple >molecules. I think the present example is trying to tackle too many >problems at once > > I agree that the final document should contain more including simpler > examples. I'm prepared to provide some concerned with chiral molecules. > At the moment, I think it would not be too helpful to overload an already > long text with more examples at the moment. > >PMR> There are many groups that are not isomorphic to a point group. They >include permutation groups and products. I spent some time many years ago >looking at whether such groups could usefully be represented geometrically. > > It sounds as though we should drop the automorphism group. > > > >PMR>> # 3) only one molecule can be described > >PMR Response >------------------- >CML can store multi-molecules - e.g. hexane+urea. The problem seems to >come from conformers > > There are problems with racemates as well. Of course the racemate is > not really 'one molecule' although it is often incorrectly treated as > though it were. (i.e. despite the fact that every molecule in a racemate > is chiral, most chemists think of the racemate as being achiral!) Racemates are more difficult to describe than appears at first sight. IUPAC has wrestled with this. At least in crystallography the results are usually well defined! >PMR>CML is only just starting to tackle the problem of describing >molecules as assemblies of fragments. Do your fragments have unfilled >valences, dummy atoms, etc.? > > I've come to the opinion that describing the molecules in terms of > fragments i.e. TNT formed of substituted benzene, nitro group, etc is the > part of the spec which has the least potential practical application. It > looks like a chemical decomposition of the molecule. I wondered whether > it would not be better to shelve it at least for the time being. I can't > think of what practical application I would use it for. Agreed. The fragment approach is similar to the chemist's Markush approach. This describes a molecule as (say) R1-CO-NH-R2 where R1 and R2 have a list of values. Even if the list is only length 1 it can be quite difficult to work out what the molecule is. There is pressure from the patents offices to drop Markush structures in favour of explicit enumeration. I certainly think that an explicit formula for the molecule should always be present. More in next mail. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 _______________________________________________ coreCIFchem mailing list coreCIFchem@iucr.org http://scripts.iucr.org/mailman/listinfo/corecifchem
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Follow-Ups:
- Re: CoreCIFchem Discussion #6 (David Brown)
- References:
- CoreCIFchem Discussion #6 (David Brown)
- Re: CoreCIFchem Discussion #6 (Howard Flack)
- Prev by Date: Re: CoreCIFchem Discussion #6
- Next by Date: Unifying chemistry and crystallography
- Prev by thread: Re: CoreCIFchem Discussion #6
- Next by thread: Re: CoreCIFchem Discussion #6
- Index(es):