[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Phase Identifiers.
- To: Multiple recipients of list <phase-identifiers@iucr.org>
- Subject: Re: Phase Identifiers.
- From: Brian McMahon <bm@iucr.org>
- Date: Wed, 3 Apr 2002 17:36:54 +0100 (BST)
Dear Colleagues Once the scientific discussion of phase identifiers begins in earnest, you will hear very little from me, for I am no expert in such matters. But as consultant I wanted to start the ball rolling with a few background ideas to point out some of the technical considerations that we need to bear in mind. IDENTIFY - v.t. - to make, reckon, ascertain or prove to be the same [Late Latin identificare - idem, the same; facere, to make] (Chambers English Dictionary) David's position paper accurately highlights the purpose of an identifier as a label, name or designation that demonstrates that two instances of, or references to, a thing are the same. In practice one recognises that proving that two things are the same can be difficult, and usually involves comparing limited information in a particular context. For example, I have two correspondents called "David Brown", but I have no problem distinguishing between their messages, because they have different associated email addresses and my mail tool can group according to address. On the other hand, the email identifier is useless if I try to tell someone at a meeting "Please give this book to David Brown in the next room - you'll recognise him by his email address". But I might be able to use information on the person's name badge ("He's the David Brown from McMaster in Canada"), or I might use some additional external clue ("He's the David Brown with the beard"). In this example, I have never actually met the "other" David Brown, so I don't know a priori whether the beard test is a good enough discriminator. The fact that different amounts of information may be known by different people (or at different times) argues for an identifier - or series of identifiers - that can be applied piecewise to match the specificity of a search. In this respect the CCN nomenclature system for phase transitions is quite far sighted: it provides a number of fields, not all of which need to be populated, and database searches could be conducted using a subset of the available fields. Where appropriate, matching hits can be compared using some of the other fields in an attempt to confirm or refute the identity of the matches. My quibble with the scheme as it presently stands is that the fields are designated by order, and certain fields may serve different purposes according to the type of material or phase concerned. A far stronger scheme would tag each field in some unambiguous way, so that its purpose was clear. Ordering or absence of individual fields would therefore be irrelevant. One approach to defining an identifier would be to draw up a set of tags representing desirable fields. A natural way (for at least some of the members of this Working Group) would be to construct a small CIF dictionary containing the new tags. This would have the incidental benefit of being expandable to define other data names characterising or related to phases, and becoming in time a fully-fledged CIF extension dictionary (or, if small enough, being incorporated into the core dictionary). This does not imply that the final form of an identifier be CIF-like; but it would allow us to concentrate initially on rigorously defining the characteristics that should be incorporated into the identifier. Once the content is established, a more printer-friendly notation can be generated if desired. Note also that a dictionary established to describe phase identifiers could equally well accommodate additional identifiers, suitably tagged. That is, "external identifiers" in David's nomenclature - e.g. database reference codes - could be added to the dictionary and serve the purpose of (a) locating records in a specific database; (b) confirming an identification by allowing comparison of external identifiers where they are available. As David notes, assignment of such "external identifiers" is really the preserve of a registration authority, and at present such authorities would seem naturally to be managers of databases of the objects of interest. If there are no existing databases of phases, members of this working group with contacts in the crystallographic database community might wish to pursue the possibility of establishing one or more such databases. (This is, of course, not a direct charge on the Working Group!) >From David's paper it seems quite possible that more than one database maintainer might wish to establish a database of phases; in which case "external identifiers" could follow either or both of two routes. One is the assignment of database-specific identifiers, e.g. _phase_identifier_ccdc and _phase_identifier_nist; the other is participation in a common labelling scheme based on a digital object identifier (DOI: see http://www.doi.org). For such a common scheme to work effectively, a resolver service would be needed to divert attempts to access a DOI to the particular database provider responsible for generating that DOI. At present this is purely a hypothetical (and possibly Utopian) ideal of database interoperability, but it's an idea I would like to pursue in more general terms with the IUCr database committee and with interested parties in CODATA. The IUPAC Chemical Identifier (IChI) ==================================== IUPAC has been working on an identifier for chemical structures that has also had to address many of the same issues. Some background information on the IChI project and information on how to obtain a copy of the beta test software is at http://www.iupac.org/symposia/conferences/CIandXML_jul02/index.html I attach below the IChI for guanine, along with the following observations: (1) The current representation is XML, but does not specify a DTD. This is a working convenience, and again abstracts the content from any specific typographic or other concrete realisation. It may well be that the final release version will indeed use XML (in which case a specific public DTD would then be needed). Equally, different representations may be used according as the identifier is to be typeset, stored in a machine-readable file or otherwise displayed. (2) The full identifier is built of layers (basic structure, stereochemistry, isotopy, tautomerism and overall charge) which allow partial or tentative identification, or database searches with selective information. (3) The identifier is what David calls "internal": it can be assigned strictly from knowledge of the chemical structure itself. What remains to be demonstrated is the extent to which it is unique and unambiguous. The project leaders are confident that it has both these properties, at least for a very wide range of "normal" compounds. The test will come as they seek to encompass the "abnormal" ones. (4) Versioning information is carried along as part of the identifier "metadata". guanine ------- <structure number="1" id.name="" id.value=""> <identifier version="0.9Beta" tautomeric="0"> <basic>N2OC1NNN1N1C*4, 4-3 6-3 8-1-5-7 9-2-7 10-4-5 11-6-9-10</basic> <charge></charge> <stereo> <dbond>4-3- 8-5- 11-10-</dbond> <sp3></sp3> </stereo> </identifier> <identifier.auxiliary-info version="0.9Beta" tautomeric="0"> <!-- Auxiliary info is not a part of the identifier, it is not unique --> <atom.orig-nbr>4 7 11 10 3 9 2 1 5 8 6</atom.orig-nbr> <atom.equivalence></atom.equivalence> </identifier.auxiliary-info> <identifier version="0.9Beta" tautomeric="1"> <basic>NOC1N*4C*4, 4-3 5-3 8-1-6-7 9-2-6 10-4-7 11-5-9-10, (H4 1 2 4 5 6 7)</basic> <charge></charge> <stereo> <dbond>8-6- 8-7- 9-6- 10-7- 11-9- 11-10-</dbond> <sp3></sp3> </stereo> </identifier> <identifier.auxiliary-info version="0.9Beta" tautomeric="1"> <!-- Auxiliary info is not a part of the identifier, it is not unique --> <atom.orig-nbr>4 7 11 10 9 2 3 1 5 8 6</atom.orig-nbr> <atom.equivalence></atom.equivalence> <tgroup.equivalence></tgroup.equivalence> </identifier.auxiliary-info> </structure> Regards Brian _______________________________________________________________________________ Brian McMahon tel: +44 1244 342878 Research and Development Officer fax: +44 1244 314888 International Union of Crystallography e-mail: bm@iucr.ac.uk 5 Abbey Square, Chester CH1 2HU, England bm@iucr.org
Reply to: [list | sender only]
- Prev by Date: Phase Identifiers.
- Next by Date: RE: Phase Identifiers.
- Prev by thread: Phase Identifiers.
- Next by thread: RE: Phase Identifiers.
- Index(es):