Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

No Subject

The Chair's Report to Comcifs                     October 2000

     It is now over a year since Comcifs met at the IUCr Congress
in Glasgow and, while there have not been many postings to the
Comcifs discussion list, the various Comcifs subcommittees have
been busy.  To keep all members of Comcifs informed of these
activities, I would like to review the progress that has been
made during the last year and draws attention to both the
problems and the opportunities that lie ahead.

     Several groups have been spending the last few years
constructing new dictionaries and their efforts are now beginning
to bear fruit.  A number of dictionaries have recently been
submitted to Comcifs for approval and are in the process of being
reviewed by the Dictionary Review Committee (Brown, McMahon,
Westbrook) before being passed to Comcifs for formal approval.  
Version 2 of the mmCIF dictionary (Berman and Fitzgerald), which
incorporates a number of suggestions made by Kim Hendrick of the
European Bioinformatics Institute, was submitted for Comcifs
approval last summer.  Somewhat belatedly, this version, which is
already in use as the archive for the Protein Databank, received
formal Comcifs approval at the end of September.

     A dictionary (imgCIF/CBF, Hammersley, Bernstein and Sweet)
designed for the transmission and archiving of images from array
detectors, but also suitable for use with any multidimensional
image, has now been reviewed and formal approval is expected
soon.  This dictionary differs from the others in that it
describes files that can be written in two formats, a regular CIF
format (imgCIF) and a binary format (Crystallographic Binary
File, CBF).  The latter is a binary representation of imgCIF
designed to be used when conversion of a binary image to the
ASCII format of CIF would require too large an overhead.  The two
files are identical in content and structure, the only difference
being the format in which the image is stored.  

     Two other dictionaries are also awaiting the attention of
the Dictionary Review Committee,  msCIF (Madariaga) designed for
the reporting of modulated structures, and symCIF (Brown)
containing a basic set of items for defining space group
symmetry.

     Three further dictionaries are in an advanced state of
preparation and are expected to be presented for Comcifs review
and approval soon.  These are the small angle scattering
dictionary, sasCIF (Svergun and Malfois) endorsed by the IUCr
Commission on Small Angle Scattering and already in trial use,
the dictionary for reporting electron densities, rhoCIF
(Mallinson), endorsed by the IUCr Commission on Charge and
Momentum Density, and the dictionary for magnetic structures,
magCIF (Sikora and Pytlik), prepared by the Database of Magnetic
Structures Determined by Neutron Diffraction in Krakow.

     The dictionary for diffuse scattering, dsCIF (Proffen), has
made less progress and no extensions have been submitted during
the past year for either the core dictionary, coreCIF (Brown),
which is now in regular use by Acta Cryst. B, C and E and other
journals, or the powder diffraction dictionary, pdCIF (Toby),
which has been adopted as the future standard of the Powder
Diffraction File.

     All this activity in developing dictionaries is gratifying,
particularly as it represents the adoption of the CIF standard by
a number of the commissions of the IUCr.  Within a year or so
most of the major fields of crystallography will have
dictionaries allowing them to archive or transfer information
using CIF.  Each of these dictionaries is a major project
involving several years of hard work on the part of a number of
contributors.  Only the names of the leaders have been given
above, but behind each leader is a team of experts in the field
who have worked hard to provide the tight definitions required by
CIF.  All these people deserve our thanks.

     The macromolecular crystallography community has been
successfully lobbying for the adoption of STAR (the file
structure used in CIF) by other molecular biology groups in order
to simplify data exchange between them.  As a result of these
efforts mmCIF has been recognised by the Object Management Group
as providing the Common Object Request Broker Architecture
standard (CORBA) for the exchange of macromolecular information
between different databases.  This work has been carried out by
Westbrook and Doug Greer of UCSD.

     The rapid development of all these dictionaries is bringing
into focus the need for software to read, write and manipulate
CIFs.  Most of the major crystallographic software packages can
now either read or write CIFs, but these routines are geared to
specific applications.  As yet there is no coherent suite of CIF
application software that can be used by people preparing
crystallographic programs.  With the widespread availability of
dictionaries, the need for supporting software is becoming more
urgent.  While our community has the expertise and the commitment
to devote to the challenging job of preparing dictionaries, we
are not as well equipped to deal with the even more challenging
task of developing good software.  Comcifs is now developing a
strategy to ensure that we have the software to exploit the
potential of the dictionaries we have written.

     CIF is sometimes perceived as being nothing more than a
convenient format for exchanging crystallographic information,
but it is much more than this.  The machine-readable dictionaries
are a compendium of crystallographic information which can be
read and used by programs that contain no hard-coded information
about crystallography.  A generic CIF editor, when it is written,
will be able to  read in a CIF together with the appropriate
dictionaries and a template of items required for submission to a
journal or database.  Some of these items may be found in the
input CIF.  The remainder would automatically be requested from
the user.  The editor would authenticate any information entered
by the user to make sure that it conformed to the requirements of
the dictionary, and it would ensure that the output CIF was
properly structured.  

     For applications such as a generic editor it is important to
be able to concatenate two or more dictionaries since it is not
feasible to include all the items found in the core dictionary
(for example) explicitly in each of the specialised dictionaries
that are coming on line.  Comcifs has adopted a protocol prepared
by Bernstein, McMahon and Westbrook for creating a virtual
dictionary by a run-time concatenation of real dictionaries. 
However, the software to perform this concatenation, like much of
the other software, has yet to be written. 

     There has been some activity in software development though
much more is needed.  Both the Protein Databank and the Cambridge
Crystallographic Data Centre are developing CIF editors
(Westbrook and Johnson respectively).  Although these are
designed primarily for the use of their own contributors, they
will undoubtedly be useful to others.  A software discussion list
has been set up on the IUCr web site to allow software developers
to share their ideas and problems.  The IUCr has adopted a policy
statement which points out that the Union's copyright of CIF is
designed only to protect the CIF standard and does not imply
ownership of either files written in CIF or software designed to
read and write CIF.  This statement is an essential part of
Comcifs efforts to encourage new software.  It seeks to assure
software developers that the only requirement imposed by the
Union is that when the term CIF is used it must refer to a file
that conforms to the CIF standard approved by Comcifs.  Recently
a formal description of the CIF-STAR syntax in Backus-Naur Form
(BNF) has been prepared by Nick Spadaccini to provide answers to
those tricky software questions about what exactly is, and what
is not, allowed in CIF.  All of these activities are a necessary
prelude to the development of CIF software.

     Another problem that is becoming more urgent is the
existence of two incompatible Dictionary Definition Languages,
DDL1 and DDL2, since dictionaries written in different languages
cannot be concatenated.  Until now this has not been a serious
problem, though all the items in the coreCIF dictionary (written
in DDL1) have had to be converted and incorporated explicitly
into the mmCIF dictionary (written in DDL2).  Help is on its way
in the form of a new DDL which will be upwardly compatible with
both the earlier versions.  This new version, being developed by
Hall and Westbrook, will have increased functionality, among
other features the inclusion of machine-readable expressions
which will allow items not present in the CIF to be calculated
from those that are.  A suite of such dictionaries will
encapsulate all the relationships of crystallography.  One can
imagine a day when there will be no need for crystallographic
relationships to be hard-coded into the software.  A generic CIF
program could, in principle, perform any calculation that is
given in the CIF dictionary and, since private dictionaries can
be concatenated with official CIF dictionaries, expressions could
easily be added without having to alter other definitions or the
software.  CIF will then truly represent the language of
crystallography.  Realistically, however, this day is well in the
future.  Until we have found a way of developing an extensive
base of CIF-handling software the potential that lies at the
heart of the CIF system will not be fully realised.

     This review lists the CIF projects that have been undertaken
during the past year by many individuals, only a few of whom I
have mentioned by name.  Details of most of the projects
mentioned here are available on the IUCr web site.  I have tried
to put the work of CIF into some perspective in order to show the
challenges that still face us, specifically the problem of
developing software.  I have also tried to look beyond the
immediate problems to provide a view of the distant goal towards
which we are moving, namely the evolution of CIF from a
crystallographic file structure into a crystallographic computer
language that has a functionality similar to human language.  In
this language, CIF dictionaries will not just define the terms
used.  They will be encyclopaedias of crystallography that
contain all the important information about the discipline.

David Brown
Chair of Comcifs


*****************************************************
Dr.I.David Brown,  Professor Emeritus
Brockhouse Institute for Materials Research, 
McMaster University, Hamilton, Ontario, Canada
Tel: 1-(905)-525-9140 ext 24710
Fax: 1-(905)-521-2773
idbrown@mcmaster.ca
*****************************************************