Crystallographic Information Framework

Crystallographic Information and Data Management

A Satellite Symposium to the 28th European Crystallographic Meeting

University of Warwick, Coventry, England
Sunday, August 25, 2013

An important one-day Symposium to celebrate and develop the role of the CIF information exchange standard in crystallography: from diffraction images to structure solution, refinement, validation, publication and archiving. Speakers will explain how crystallography, particularly in its application to structure determination, is a field with supremely well developed practices in the collection, analysis, interpretation, publication and archiving of raw and processed data, and the structural information derived from diffraction experiments. These practices are so well ingrained that many crystallographers are unaware of how well we do these things, compared with other scientific fields. The Symposium will remind the crystallographic community of its great track record in data handling, and also anticipate further advances in data management, integration, validation and curation. Much of the community's success in information management arises from the Crystallographic Information Framework (CIF), a standard now entering its third decade, developed originally for small-molecule structural modelling but which is now used directly in many areas of crystallography, or informs the development of formal data definitions in others. New developments in the CIF standard will maintain our position at the leading edge of scientific information characterisation and exchange.

 
display area

I. Standard information exchange formalisms

Brian McMahon:
A coherent information flow in crystallography
[Brian McMahon]
(19 min 29 sec)

John Westbrook:
mmCIF and structural bioinformatics
[John Westbrook]
(24 min 30 sec)

Brian Toby:
pdCIF and the messy world of real data
[Brian Toby]
(22 min 31 sec)

II. Improving the management of experimental data

Simon Coles:
The data explosion and the need to manage diverse data sources in scientific research
[Simon Coles]
(21 min 58 sec)

John Helliwell:
Deposition and use of raw diffraction images
[John Helliwell]
(22 min 31 sec)

Erica Yang:
Managing research data for diverse scientific experiments
[Erica Yang]
(24 min 39 sec)

Herbert Bernstein:
Managing crystallographic data in facilities using integrated CIF, HDF5 and NeXus
[Herbert Bernstein]
(24 min 14 sec)

Simon Hodson:
Research data management and UK funding policies
[Simon Hodson]
(26 min 54 sec)

III. The integrity of published information

Mike Hoyland:
Publication of small-unit-cell structures in Acta Crystallographica
[Mike Hoyland]
(18 min 38 sec)

Tony Linden:
Validating a small-unit-cell structure; understanding checkCIF reports
[Tony Linden]
(23 min 53 sec)

Manfred Weiss:
Writing a macromolecular structure paper with publBio
[Manfred Weiss]
(17 min 39 sec)

Sameer Velankar:
Deposition and validation of macromolecular structures
[Sameer Velankar]
(25 min 09 sec)

IV. Towards ever better science

Colin Groom:
Data quality and the value of structural databases
[Colin Groom]
(20 min 05 sec)

Peter Murray-Rust:
Towards the semantic web of science
[Peter Murray-Rust]
(18 min 24 sec)

Nick Spadaccini:
Into the future with CIF
[Nick Spadaccini]
(20 min 59 sec)

Click on a thumbnail in the right-hand column to see the recorded presentation.

A brief history of CIF

Click here for an interactive timeline of CIF and other milestones in crystallographic information 

The acronym CIF is used both for the Crystallographic Information File, the data exchange standard file format of Hall, Allen &  Brown (1991), and for the Crystallographic Information Framework, a broader system of exchange protocols based on data dictionaries and relational rules expressible in different machine-readable manifestations, including, but not restricted to, Crystallographic Information File and XML.

CIF was developed by the IUCr Working Party on Crystallographic Information in an effort sponsored by the IUCr Commission on Crystallographic Data and the IUCr Commission on Journals. CIF was adopted in 1990 as a standard file structure for the archiving and distribution of crystallographic information. It is now well established and is in regular use for reporting crystal structure determinations to Acta Crystallographica and other journals. It is often cited as a model example of integrating data and textual information for data-centric scientific communication.

As a granular, structured format, CIF was well suited to the telegraphic style of structure reports required by Acta Crystallographica Section C: Crystal Structure Communications, and was immediately adopted by IUCr journals as a submission medium for this journal. It soon became the mandatory submission format to Acta C, and the mandatory format for supplementary structural data, and subsequently for structure factors, for all IUCr journals.

Automated procedures could then be developed for checking the submitted structural data. This allowed routine technical assessment of all submitted structures (although manual validation was already carried out by many conscientious Co-editors), and the number of erroneous space-group determinations and other technical errors in structure determinations declined significantly. The validation procedures were subsequently developed into the web-based checkCIF service, supported and used by other publishers and available as a general resource to the community for independent validation and assessment of the technical quality of a structure determination.

Extensions of CIF were rapidly developed to describe powder diffraction, modulated structures and electron density studies. Other extensions have followed over the years, including, recently, a description of crystallographic restraints and constraints, and, currently under review, an extension for crystallographic twinning.

An ambitious project to extend CIF to the complete description of protein structure experiments and models resulted in major enhancements to the underlying data model. The resulting mmCIF format formed the basis for the PDB database of protein structures as refactored by the Research Collaboratory for Structural Biology (RCSB) in 1999. Deposit of structure factors to the PDB became possible with the adoption of an mmCIF-based format for experimental data. The PDBx exchange format continues to build on the original mmCIF dictionary and interfaces with extensions for non-crystallographic methods and procedures in protein structure determination and characterisation (Westbrook et al., 2005). An XML format based on the same underlying data model is routinely used to maintain synchronicity between the international partners of the Worldwide Protein Data Bank.

Another important development, beginning in the late 1990s, has been the specification of imgCIF and its corresponding binary format CBF (the 'Crystallographic Binary File') for capturing and exchanging image data (Bernstein &  Hammersley, 2005). This provides a common format across the diversity of detector manufacturers, and is currently under close consideration for its possible role in promoting strategies for the routine deposition of primary diffraction image data.

In 2006 the importance of CIF and the value of checkCIF were recognised by the Award for Publishing Innovation of the Association of Learned and Professional Society Publishers (ALPSP). In their report, the judges 'were impressed with the way in which CIF and checkCIF are easily accessible and have served to make critical crystallographic data more consistently reliable and accessible at all stages of the information chain, from authors, reviewers and editors through to readers and researchers. In doing so, the system takes away the donkeywork from ensuring that the results of scientific research are trustworthy without detracting from the value of human judgement in the research and publication process'.

Research has been under way in recent years to develop a new formalism within the CIF framework for specifying data definitions with greater precision, and with machine-readable methods for expressing relations between distinct data items (Spadaccini &  Hall, 2012). This will allow automated generation of derivable data that are absent from a particular file, provided all the relevant parent data are present. This new formalism is not intended in the short term to replace the existing CIF format in routine practice, but it does have the potential to provide a unifying computational framework for applications requiring CIF input from different subject areas.

References

Bernstein, H. J. &  Hammersley, A. P.  (2005). Specification of the Crystallographic Binary File (CBF/imgCIF), in International Tables for Crystallography, Volume G: Definition and exchange of crystallographic data, S. R. Hall &  B. McMahon, Editors. 2005, Springer: Dordrecht, The Netherlands. pp. 37-43.

Hall, S. R., Allen, F. H. &  Brown, I. D. (1991). The Crystallographic Information File (CIF): a New Standard Archive File for Crystallography. Acta Cryst. A47, 655-685.

Spadaccini, N. &  Hall, S. R. (2012). DDLm: a new dictionary definition language. J. Chem. Inf. Model. 52, 1907--1916.

Westbrook, J., Yang,, H., Feng, Z. &  Berman, H. M. (2005).  The use of mmCIF architecture for PDB data management, in International Tables for Crystallography, Volume G: Definition and exchange of crystallographic data, S. R. Hall &  B. McMahon, Editors. 2005, Springer: Dordrecht, The Netherlands. pp. 539-543.

Date and venue

Sunday 25 August 2013 09:00-17:00

Room S0.21 (Social Sciences Building)
University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK. 

SO21 
warwick_map

Registration

There is no charge for attending the Symposium. However, places are limited and must be pre-booked. Please register using the ECM 28 online registration form.

 

Travel

Based on information on the University of Warwick website, which also has additional information. 

Air

Nearest international airports are:
  • Birmingham International Airport (BHX): to get to the University from Birmingham airport either: (cheapest) take the train to Coventry (£3.50 return, 10 mins) and then local bus (£1.60 single, 25 mins; exact fare must be paid to driver) or taxi (about £10, 15 mins) to the University of Warwick; or (possibly slightly quicker, but expensive - about £30 for the half-hour ride) take a taxi from the airport direct to the University of Warwick
  • London Heathrow (LHR) has many more flights, and is about 2 hours away from Coventry either by bus (National Express) or by rail (via London). The bus option is simpler, and cheaper. Departures from Heathrow Airport on 21 August at 07:15, 08:30, 09:45, 11:00, 12:15, 13:30.
  • Manchester (MAN): rail connections from Manchester Airport to Coventry (usually 2 or 3 per hour). Journay time 2h 15m - 2h 45m; cost about £16-£50.

Rail

Nearest main-line railway station is Coventry, served directly from London Euston, Birmingham New Street (on average every 10 minutes during the day), Birmingham International (serving the airport, on average every 10 minutes during the day), Oxford, Leicester.

Coach

If travelling by coach, the most convenient major centre is Coventry which is well served by national links. From Coventry, there are frequent local bus services to the University. Passengers may be able to alight a coach at the University central campus bus stop; however this needs to be booked 24 hours in advance of travel and not all services offer this option. Details of coach travel can be obtained from . Megabus also offer a coach service to Coventry from a selection of locations around the United Kingdom, including London. A Megabus coach stop is located at Cannon Park Shopping Centre, a 5-10 minute walk from central campus.

Local bus

There are regular bus services to the University campus from Coventry City Centre and Coventry Rail Station with the journey taking approximately 30 minutes.

  • National Express Coventry service number 12 runs from the city centre bus station (Pool Meadow) via Coventry Rail Station to the University Central Campus. Service number 11 follows the same route but continues on from campus to Leamington Spa, via Kenilworth. Number 12 & 11 timetable  
  • The Travel De Courcey 360a/360c route runs in a loop from and back to the University Hospital via Arena Tesco, Eastern Green, University of Warwick, Cannon Park, Cheylesmore, Jaguar Land Rover, Willenhall and Clifford Bridge Road. Number 360a/360c timetable 
  • Stagecoach also provides a service that passes the University. The Stagecoach service X17 runs from the city centre, passing the junction of Gibbet Hill Road and Kenilworth Road. The entrance to the Gibbet Hill campus is about 5 minutes walk from this bus stop. After reaching campus this service goes on to Kenilworth, Leamington Spa, Warwick Hospital and Warwick Town Centre. X17 timetable
  • If you are travelling from Leamington Spa, Stagecoach provides a dedicated frequent bus service, the Unibus (U1) that passes through campus. There is a second version of this route that passes through Kenilworth and is named the U2. For information on this service, please visit the Stagecoach website. This service also covers Coventry City Centre and Coventry Rail Station on Sundays and Public Bank Holidays. Unibus (U1 & U2) timetable
  • The Travel Coventry service number 11 also connects Leamington Spa and Kenilworth with the university campus and Coventry. Number 11 timetable 

Bus Travel tips

  • Travel Coventry buses do not usually give change so make sure you have the correct money available before making your journey. You can find information about fares on the National Express Coventry website but it is best to check with the driver when you board the bus.
  • Stagecoach buses do give change on the majority of their services.
  • From Coventry Rail Station visitors should follow the signs to Warwick Road (a 2 minute walk) and from there catch the Travel Coventry services 11, 12 or U17 (Sundays only), which travel onto the main campus - see above.

Road

  • From the North: From the M69/M6 interchange (M6 Jct 2) take A46 towards Warwick and Coventry S & E. After approximately 3.5 miles you will reach Tollbar End roundabout (junction with A45). At the roundabout, follow signs for A45 Birmingham. After approximately 3 miles you will cross the A429 (Kenilworth Road); half a mile after this junction take the left-hand turn signposted 'University of Warwick'. Follow signs for University of Warwick (and Warwick Arts Centre) across two roundabouts. You are now approaching the University of Warwick from Kirby Corner Road.
  • From the South East: From M45 Jct 1 take A45 towards Coventry. After approximately 7 miles you will reach Tollbar End roundabout (junction with A46) follow signs for A45 Birmingham. Now follow directions as for arriving from the North.
  • From the South: From M40 Jct 15 take A46 towards Coventry. After approximately 8 miles leave A46 at junction signposted 'University of Warwick and Stoneleigh'. After a further 1.5 miles you will cross the A429 (Kenilworth Road). You are now approaching the University of Warwick from Gibbet Hill Road.
  • From the West: From M42 Jct 6 take A45 towards Coventry. After approximately 9 miles you will pass a large Sainsbury store on your left. At the next roundabout (Fire Station on right) take the right-hand exit, signposted 'University and Canley'. Follow signs for University of Warwick (and Warwick Arts Centre) across two roundabouts. You are now approaching the University of Warwick from Kirby Corner Road. 

Campus Maps