[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Annual Report for 2005
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Annual Report for 2005
- From: David Brown <idbrown@mcmaster.ca>
- Date: Thu, 20 Apr 2006 11:26:41 -0400
COMCIFS Annual Report for 2005 to the IUCr Executive Committee This year marks the fifteenth year since the Union adopted CIF (Crystallographic Information Framework, formerly Crystallographic Information File) as a standard for submission of crystal structure reports to the Union journals. Much has happened in that time and the IUCr Congress in Florence provided an opportunity for COMCIFS to take stock of the project and plan its future directions. The most notable achievement of the past fifteen years has been the preparation of an impressive array of CIF dictionaries that provide data-names and definitions for the two thousand or so crystallographic terms that can appear in CIFs. No other discipline has a comparable set of dictionaries with such a wide community acceptance. These dictionaries are used in conjunction with the STAR file syntax as the format for the considerable archive of CIF-based structure reports. In the field of small-cell crystallography CIF is now widely accepted as the standard for the submission of structure reports to many scientific journals, and for their archiving and downloading. In the macromolecular field CIF is used to archive the Protein Data Bank, but it does not yet have as wide community acceptance, most protein structure laboratories preferring to stay with the familiar, if inadequate, PDB format, and the macromolecular data centres favouring the use of XML. XML is a markup language with many functional similarities to the STAR file structure used by CIF. Although a recent arrival, its development by the information technology community has earned it widespread acceptance in many scientific communities. It is more flexible than CIF, though this is not necessarily an advantage in an established field like crystallography. It allows users to develop their own semantics and define concepts in ways that may not be compatible with those defined by other users. Although XML users have access to an extensive suite of programs to manipulate their files, unless they agree on the semantics, i.e., the definitions and organization of the concepts of their discipline, they are unable to communicate with each other. CIF's suite of dictionaries provides a widely accepted semantic for crystallography which can be translated into an XML format for the benefit of XML users, though the reverse process is only possible if the XML file is written in a form designed to be compatible with CIF. COMCIFS is working to ensure that the information contained in CIFs and CIF dictionaries is available in XML format. Some conversion programs are already available and more work is planned. Our goal is to enable CIFs to be read by generic programs that obtain all their crystallographic knowledge directly from the CIF dictionaries. This requires that all CIFs rigorously conform to the standard. In the early days this standard was not strictly enforced so as to avoid discouraging those who found CIF strange and unfamiliar, but over the years the degree of conformity has been steadily increased and the CIF standard itself has evolved in subtle ways as we became more aware of the possibilities inherent in the STAR syntax. Thus after preparing the coreCIF dictionary as a STAR file using the Dictionary Definition Language 1 (DDL1) it was decided that the macromolecular CIF dictionary should use advanced features that were only available in DDL2. The result was two incompatible CIF dialects, CIF1 and CIF2, using dictionaries based on DDL1 and DDL2 respectively. This required different programs for each dialect, or a duplication of effort to ensure that a single program could read both. While this decision made sense at the time, it has returned to haunt us as we strive to ensure that we retain compatibility between the CIF1 and CIF2 definitions even as the dictionaries evolve independently. The problem of CIF dialects was discussed in Florence at the closed COMCIFS meeting. Here we developed a consensus that we should move towards a new dictionary language, DDL3, with corresponding CIF3 dictionaries. Programs designed to work with CIF3 dictionaries would be fully back-compatible and able to read any file written in either CIF1 or CIF2. A prototype has already been tested and an early approval of DDL3 will allow the conversion of the existing CIF1 and CIF2 dictionaries to CIF3. The opportunity is being taken to incorporate advanced features that were unimagined fifteen years ago. One of these is the development of an hierarchy of crystallographic concepts that would add flexibility and allow the dictionaries to evolve in parallel. Another innovation is the introduction of algorithms that instruct a program how the value of an item can be calculated on the fly from other items present in a CIF. These algorithms are computer readable definitions that will enhance the ability of CIF dictionaries to serve as machine-readable repositories of crystallographic knowledge. While these activities help to keep CIF at the forefront of information technology, COMCIFS is also concerned not abandon those who find themselves still challenged by the demands of checkCIF. From the beginning we knew that we would need a suite of tools to assist in preparing CIFs. The last couple of years has seen the appearance of a number of such programs, e.g., enCIFer, publCIF and CIFedit, that use the appropriate CIF dictionaries to assist users in writing fully conformant CIFs. PublCIF has been developed by the IUCr editorial office and is well-tuned to the publication requirements for small-cell structures. It will continue to be developed to handle macromolecular structure reports that are accompanied by structural data in mmCIF format, as the editorial production processes develop to handle such articles efficiently. Other tools are under development in an IUCr-sponsored project to upgrade some older CIF software to strict compliance with the latest CIF specifications. This project includes updates to vcif, a simple syntax checker, and to CIFtbx, a Fortran library; and the provision of a utility to manage the relaxation of the line and data name length restrictions in CIF version 1.1. As the existing dictionaries are converted to DDL3 we will encourage the preparation of CIF3-level programs that will be able to read any CIF whether written as CIF1, CIF2 or CIF3. We expect, however, that the existing dictionaries will continue in use until the advantages of CIF3 become sufficiently apparent that users voluntarily convert. Among the routine business transacted during the course of the year were the preparation of new terms of reference expanding the mandate of COMCIFS to ensure that crystallographic information in digital form is compatible with standards in neighbouring fields. These terms were subsequently approved by the Executive Committee. COMCIFS also formally adopted responsibility for the maintenance of the DDL1 dictionary which had no organization designated to authorize and approve necessary changes. Finally, a complete documentation of CIF concepts and associated data dictionaries has been completed as Volume G of the IUCr International Tables series. I. David Brown Chair
begin:vcard fn:I.David Brown n:Brown;I.David org:McMaster University;Brockhouse Institute for Materials Research adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada email;internet:idbrown@mcmaster.ca title:Professor Emeritus tel;work:+905 525 9140 x 24710 tel;fax:+905 521 2773 version:2.1 end:vcard
Reply to: [list | sender only]
- Prev by Date: Announcing the rfCIF dictionary
- Next by Date: Accent escape sequences
- Prev by thread: Re: Accent escape sequences
- Next by thread: Announcing the rfCIF dictionary
- Index(es):