[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Report of IUCr Representative to CODATA 2000-2002
- To: Multiple recipients of list <epc-l@iucr.org>
- Subject: Report of IUCr Representative to CODATA 2000-2002
- From: Brian McMahon <bm@iucr.org>
- Date: Tue, 18 Jun 2002 16:13:42 +0100 (BST)
The Scientific Union Representatives to CODATA are required to submit a report of the data-related activities of their Unions in the period between CODATA Congresses. I attach my current draft in case anyone is interested or wishes to comment (or point out anything vital that I have overlooked). Best wishes Brian ------------------------------------------------------------------------------ Report to CODATA of Activities of the International Union of Crystallography (IUCr) 2000-2002 The International Union of Crystallography (IUCr) is a scientific union adhering to the International Council for Science (ICSU). Its objectives are to promote international cooperation in crystallography and to contribute to all aspects of crystallography, to promote international publication of crystallographic research, to facilitate standardization of methods, units, nomenclatures and symbols, and to form a focus for the relations of crystallography to other sciences. Crystallographic Databases -------------------------- Several independent databases exist that store and manage the results of crystal structure determinations. Among the most important are - the Cambridge Structural Database for organic and metal-organic small-molecule structures and oligonucleotides (CSD); - the Protein Data Bank for protein and nucleic acid structures (PDB); - the Inorganic Crystal Structure Database for inorganic materials (ICSD); - the Metals Crystallographic Data File for metals (CRYSTMET). Other crystallographic databases store non-structural data, including: - the NIST Crystal Data collection of unit-cell parameters; - the NIST Biological Macromolecule Crystallization Database and the NASA Archive for Protein Crystal Growth Data ; - the Powder Diffraction File. These databases are curated by independent organisations, but the IUCr monitors their development through a standing Database Committee (CCD) that reports directly to the Union's Executive Committee. Among the changes noted during the period by the CCD are: - Consolidation of the Protein Data Bank under the management of the Research Collaboratory for Structural Biology, with centres at Rutgers University, San Diego Supercomputer Center and NIST, Washington DC, USA. H. D. Berman is the Director of the RCSB-PDB. - Release of new database access and visualisation software within the Cambridge Structural Database system. The CSD includes over 250,000 structures as of October 2001. - The Metals and Alloys Data File (CRYSTMET) was brought fully up to date in 1999 and Toth Inc. of Canada offer search and visualisation software that also operate on data from the Inorganic Crystal Structure Database. - H. Behrens has retired as Head of the Inorganic Crystal Structure Database and has been succeeded by P. Luksch. There is continuing collaboration with NIST. - R. Jenkins has retired as Executive Director of the International Centre for Diffraction Data and is succeeded by T. Fawcett. ICDD has released its powder files (PDF-4) in relational format, and is collaborating with the Cambridge Crystallographic Data Centre to generate a database of calculated powder patterns from CSD contents. A special issue of the IUCr journal Acta Crystallographica was published in 2002 that describes the current operation of the crystallographic databases and a selection of research applications. Data Exchange ------------- Development continues on the Crystallographic Information File (CIF), the standard file format for archiving and exchanging crystallographic data developed and adopted by the IUCr in 1991. This exchange standard comprises printable ASCII files conforming to a simple grammar and populated with identifiers defined in external machine-readable dictionaries. The identifiers provide universal labels for well defined items of data, ranging from atomic coordinates to entire journal research articles. Six dictionaries (collections of terms specific to particular areas of crystallography) are now available, covering - core data items in small-molecule structural crystallography (coreCIF), published 1991, updated 1997, 1999 and 2001; - powder diffraction (pdCIF), published 1997; - macromolecular structure determination and secondary structure characterisation (mmCIF), published 1997 and updated 2000; - image-plate data, annotation and analysis (imgCIF), published 2000; - crystallographic symmetry specification (symmCIF), published 2001; - modulated incommensurate structures (msCIF), published 2002. Except for the powder dictionary, all these have been published or updated since the last CODATA Congress in Autumn 2000. Working groups exist to define data names in other relevant areas of the subject. Dictionaries currently under development cover the fields of - small-angle scattering; - magnetic structures; - electron density studies. Coordination of the content of these dictionaries and approval for public adoption is the responsibility of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS). The coreCIF format is used by the Cambridge Crystallographic Data Centre to import structural data from journals and as an export format; mmCIF provides the data capture and exchange format for files in the Protein Data Bank. The PDB has recently released software capable of constructing mmCIF data sets for deposition from legacy software applications, a move designed to facilitate the direct deposition of such results. coreCIF is the mandatory submission mechanism for structure reports of small-molecule and inorganic compounds published in the IUCr journals Acta Crystallographica Sections C (Crystal Structure Communications) and E (Structure Reports Online) and is the mandatory mechanism for deposit of supplementary structural data in other IUCr journals. Several chemistry journals from other publishers also require deposits in CIF format. The IUCr journals staff are working with the PDB to develop an analogous deposit and publication workflow for high-throughput structural genomics reports. Interoperability ---------------- CIF is a domain-specific data format, but follows good practice in separating form (the file syntax) from content (the external dictionaries that define the meanings of the data names or tags used within the file). Such a design means that interoperability with other data exchange mechanisms is possible wherever a mechanistic syntax conversion can be performed. The data names in CIF dictionaries have been used as models for a CORBA description of macromolecular structure adopted by the Object Management Group, and for portions of the content of the XML-based Chemical Markup Language (P. Murray-Rust) which is a candidate as a molecular description language under consideration by IUPAC. A particular requirement in interoperable systems is the establishment of unique identifiers that can locate the same object as stored in different databases. The IUCr has been involved with the requirements for an identifier of crystalline phases needed by the IUPAC-CODATA Task Group on Standardisation of Physicochemical Property Electronic Datafiles (IUCOSPED; H. Kehiaian), and with the development of a chemical identifier being undertaken by IUPAC. An interesting step forward in securing interoperability between data sources would be the establishment of a metadata standard identifying the topic areas covered by a scientific or technical database. A well-defined standard would facilitate the distributed querying of disparate databases. Data Validation --------------- All structural data sets published in IUCr journals are checked for internal consistency by software capable of reading CIF submission or deposit files directly. The results of such checks are made available to the referees of papers submitted to IUCr journals, and can form the basis for rejection or a request for revision of the text or even re-refinement of a structure submitted for publication. The criteria for assessing structural data are published on the IUCr web pages. Different journals handle the quality assessment criteria in different ways. In the case of Acta Crystallographica Section E: Structure Reports Online, which was launched as an online-only journal in 2001, the output from the validation software is posted as an accompaniment to each paper on the web. The IUCr offers similar data validation services tailored according to the individual requirements of journals from other publishers, and encourages uptake of this service as a way to improve the overall quality of structural data in the scientific literature and associated databases. Electronic Publishing --------------------- The IUCr publishes six primary research journals in crystallography, and a seventh covering the technology, instrumentation and uses of synchrotron radiation. All are published online and, with the exception of Acta Crystallographica Section E: Structure Reports Online, also in print. Crystallography Journals Online is located at http://journals.iucr.org. A major digitisation project completed in late 2001 made available PDF page images of all articles published in IUCr journals since 1948. Papers published since 1999 are also, for the most part, available as navigable HTML versions with internal hyperlinking and links to other journals. The online service also offers access to published data in CIF format, allowing visualisation and import into structure refinement programs; hypertext access to the experimental data sets used in structure determination experiments; and many other benefits to authors and readers (e.g. e-mail alerting of tables of contents, manuscript status enquiries, downloads of proofs and offprints). Bibliographic data on published articles is uploaded to the CrossRef publishers' hyperlinking service, allowing references to IUCr journals to be resolved into HTML pointers to the location of those articles. Similar links exist from articles to the associated data sets deposited in some of the structural databases. In the case of macromolecular structures, links to PDB files exist over the web. For small-molecule entries in the CSD the link is currently to a summary page that includes the reference code needed to access a local copy of the CSD. It is hoped that in future web links may be established to this database. A pilot web linkage to inorganic structures in the ICSD is under active devlopment. In each case, reciprocal links exist from the respective database entries to their associated articles in Crystallography Journals Online. While many of these journal/database linkages are at present negotiated and set up through bipartite contacts, it would be useful to consider the establishment of a data-centric linking organisation playing a role similar to CrossRef in the publishing field. Perhaps such a project could be considered by CODATA. Web of Information ------------------ The web site http://www.iucr.org continues to host news items, bulletin boards, conference diaries, employment notices, directories of laboratories, research facilities and individual crystallographers, links to educational and commercial resources, book notices, project reports and many other items. The World Directory of Crystallographers has been completely re-engineered in the last year, and is actively being updated by crystallographers themselves to record their contact details and professional and research interests. The Union's Committee for Electronic Publishing, Dissemination and Archiving of Information acts as the editorial board for the web site. Long-Term Preservation of Digital Content ----------------------------------------- The undersigned and H.D. Flack, IUCr Representative to ICSTI, participated during 2000 in the ICSTI review of the Open Archival Information System (OAIS) reference model. Elements of this reference model were incorporated in the drafting of a policy document on the archiving of the Union's electronic journal content, which was subsequently approved by the IUCr Executive Committee. As yet the policy is incompletely implemented, but the drafting exercise underlined the necessity of proper provision for long-term preservation and access. Although the IUCr has direct control only over its own publications, it intends to construct a registry of crystallographic databases that make adequate provision for long-term preservation and access as identified in the OAIS model. Such a registry in the field of crystallography would be complementary to the registry of electronic physics archives envisaged by IUPAP (http://publish.aps.org/IUPAP/ltaddp_report.html). It is suggested that CODATA could usefully identify such inititatives on the part of other of its members, and act as a higher-level registry of registries. The collation of self-certified OAIS-compliant data providers would raise the profile of this important issue, and would place on any participating organisation the onus of publicly declaring and defending their archiving policy. Brian McMahon IUCr 18 June 2002 ------------------------------------------------------------------------------
Reply to: [list | sender only]
- Prev by Date: Agenda and Appendices to Agenda for 19th General Assembly
- Next by Date: EPC-L: Re: Report of IUCr Representative to CODATA 2000-2002
- Prev by thread: EPC-L: Re: Report of IUCr Representative to CODATA 2000-2002
- Next by thread: Re: Report of IUCr Representative to CODATA 2000-2002
- Index(es):