Acta Cryst. (1991). A47, 655-685
There is an increasing need in many branches of science for a uniform but flexible method of archiving and exchanging data in electronic form. Rapid advances in computer technology, coupled with the expansion of local, national and international networks, have fuelled the need for such a facility. The variety and relative inflexibility of existing data exchange formats have inhibited their effective use. This is true even in fields where the basic data requirements are well defined. Problems of data exchange are further exacerbated if the number and nature of data types change rapidly and continuously. Under these conditions specialized and local file formats have proliferated. This diversity was tolerable when electronic data transfer was infrequent, or when data processing speeds required file formats finely tuned to specific applications. The developments cited above signal an end to this rationale. A general, flexible, rapidly extensible and universal file format protocol is now essential. It must be machine-independent and portable so that accessibility to data items is independent of their point of origin. It must allow new data items to be incorporated without the need to modify existing files.
In addition to archiving data, the use of a universal file would facilitate data exchange between software within a laboratory; between different laboratories; between authors and journals, providing electronic input to the publication process; and between researchers or journals and computerized databases.
Crystallography is not excepted from the need for a universal exchange file. Its activities are dominated by advanced computer-controlled equipment and sophisticated software systems which measure and process data. In most cases, especially for small and medium-sized molecules, these data are clearly defined and standardized, and are generated in machine-readable form. The problem is, however, that there are too many different forms and, despite the fundamental role that computing plays in our discipline, only limited effort has been directed at devising a general and common format.
In the late seventies the IUCr Commissions on Crystallographic Data and Crystallographic Computing promoted the development of the Standard Crystallographic File Structure (Brown, 1983, 1988). The SCFS is based on the concept of formatted lines and keywords that identify blocks of data containing items in a specific order. The SCFS format satisfies some but not all of the requirements of a universal data exchange file.
At the XIV IUCr Congress in Perth it was proposed that Acta Crystallographica promote the submission of data in machine-readable form. This was seen as being particularly beneficial for Section C, which publishes about 1000 small-molecule and inorganic crystal structures a year. Each paper is currently prepared as a typed manuscript and converted to machine-readable text for computer typesetting. Some of this work is carried out by Acta Crystallographica staff. All steps involve more manual effort than is desirable. Both the data and text are prone to transcription errors in their passage from the computer and the author to the printed page. Machine-readable submissions would reduce input errors, minimize labour-intensive data entry and check procedures, and speed the publication process. The submitted data could also be transmitted directly to the relevant crystallographic databases. An IUCr Working Party on Crystallographic Information (WPCI) was set up to investigate the feasibility of such a submission process, and to coordinate the input of various IUCr Commissions that were involved in these types of activities.
It was soon recognized that submission of text and data to journals and databases required the use of a universal exchange file, some of whose properties could not be reconciled with the constraints imposed on the SCFS by its format. At a meeting of the IUCr WPCI, held in conjunction with the XI European Crystallographic Meeting (1988) in Vienna, it was decided to develop a universal file based on the Self-Defining Text Archive and Retrieval (STAR) procedure of Hall (1991a). The STAR File is intended for the electronic exchange of data and provides for text and numerical data in any order.
The WPCI commissioned the authors to develop a universal exchange file to be called the Crystallographic Information File (CIF). A preliminary report on this development was presented at the XV IUCr Congress and General Assembly (1990) in Bordeaux as part of the Open Meetings of the IUCr Commissions on Crystallographic Data and Computing. This paper is a detailed description of the CIF development.
A major feature of this work has been the development of a comprehensive Dictionary (Core Version 1991) of crystallographic data items. Each data item has been assigned a self-explanatory name for use in a CIF and each item is precisely defined within the Dictionary which appears in this paper as Appendix I. The Core Dictionary defines only those fundamental data items that are commonly used in a single-crystal structure analysis. Future extensions will encompass data items that are relevant to specialized areas of crystallography. The Core Dictionary is also available as an electronic file suitable for use with CIF computer applications.
To aid the description of the CIF a brief introduction to the underlying concepts of the STAR File, on which the CIF application is based, will be given. Full details of STAR File specifications are available in the literature (Hall, 1991a).
Back to title pageOn to STAR File concepts and syntax
Copyright © 1991 International Union of Crystallography
IUCr Webmaster