Crystallographic Information and Data Management
A Satellite Symposium to the 28th European Crystallographic Meeting
University of Warwick, Coventry, England
Sunday, August 25, 2013
An important one-day Symposium to celebrate and develop the role of the CIF information exchange standard in crystallography: from diffraction images to structure solution, refinement, validation, publication and archiving. Speakers will explain how crystallography, particularly in its application to structure determination, is a field with supremely well developed practices in the collection, analysis, interpretation, publication and archiving of raw and processed data, and the structural information derived from diffraction experiments. These practices are so well ingrained that many crystallographers are unaware of how well we do these things, compared with other scientific fields. The Symposium will remind the crystallographic community of its great track record in data handling, and also anticipate further advances in data management, integration, validation and curation. Much of the community's success in information management arises from the Crystallographic Information Framework (CIF), a standard now entering its third decade, developed originally for small-molecule structural modelling but which is now used directly in many areas of crystallography, or informs the development of formal data definitions in others. New developments in the CIF standard will maintain our position at the leading edge of scientific information characterisation and exchange.
Programme
I. Standard information exchange formalisms09:05 A coherent information flow in crystallography - B. McMahonAbstract | Presentation![]()
Brian McMahon is the Research and Development Officer at the International Union of Crystallography's offices in Chester, UK, and a former IUCr Representative to CODATA, the ICSU Committee on Scientific Data. He is Coordinating Secretary of COMCIFS and a Co-editor of International Tables for Crystallography Volume G: Definition and exchange of crystallographic data. (hide | hide all)09:30 mmCIF and structural bioinformatics - J. WestbrookAbstract | Presentation![]()
John Westbrook is a Project Team Leader of the RCSB Protein Data Bank, and is based at Rutgers University. He has played key roles in creating and maintaining the PDB database schema, in developing many of the software tools that underpin PDB operation, and in developing formal ontologies with other structural biology communities. He is a member of COMCIFS. (hide | hide all)09:55 pdCIF and the messy world of real data - B. H. TobyAbstract | Presentation![]()
Brian Toby has had extensive professional experience in the chemical industry, academic and in the government sectors, where the latter included both synchrotron and research reactor facilities. His research interests are in understanding how the arrangements of atoms in solids determine how the material functions, chemically or physically, and for the development and teaching of techniques for those studies. To do this, he works on software and instrument development as well as conducts measurements and analyzes the results. He has collaborated with researchers in academia, industry and government to produce 120 papers that have been cited nearly 6000 times. He is a former member of COMCIFS and was the leader of the powder CIF (pdCIF) dictionary development effort. (hide | hide all)II. Improving the management of experimental data10:40 The data explosion and the need to manage diverse data sources in scientific research - S.J. ColesAbstract | Presentation![]()
Simon J. Coles is Head of the UK National Crystallography Service and a member of staff of the Department of Chemistry at Southampton University. (hide | hide all)11:05 Deposition and use of raw diffraction images - J.R. HelliwellAbstract | Presentation![]()
John R. Helliwell trained in physics and molecular biophysics and is now Emeritus Professor of Structural Chemistry at the University of Manchester. He is former Editor-in-Chief of the journals of the International Union of Crystallography and Past President of the European Crystallographic Association. His research involves crystallography methods developments applied to structural chemistry and biology. He is currently IUCr representative to CODATA (the ICSU Committee for Scientific Data) and ICSTI (the International Council for Scientific and Technical Information), and chairs the IUCr Diffraction Data Deposition Working Group. He is also a member of the CODATA/VAMAS Working Group on the description of nanomaterials. (hide | hide all)11:30 Managing research data for diverse scientific experiments - E. YangAbstract | Presentation![]()
Erica Yang is a senior computer scientist at the STFC Rutherford Appleton Laboratory, where she works in the Scientific Computing Department (SCD). She is also the national labs services liaison officer, responsible for developing a sustainable long term data services research and development strategy and roadmap for STFC's national laboratories. She has worked with the UK National Crystallography Service and the University of Cambridge to develop cross-organisation data infrastructure technologies for the UK structural science communities. She also manages and directs projects involving STFC's facilities and large scale data and HPC infrastructures. In EU FP7 project "PaNData-ODI", she works closely with international facilities (e.g. ESRF, ILL, DESY) to define and develop a fully integrated and cross-facility data management roadmap and services for EU photon and neutron communities. (hide | hide all)11:55 Managing crystallographic data in facilities using integrated CIF, HDF5 and NeXus - H.J. BernsteinAbstract | Presentation![]()
Herbert J. Bernstein is Professor of Mathematics and Computer Science at Dowling College, Oakdale, NY. He is a member of COMCIFS, Chair of the imgCIF dictionary working group, and lead developer of CIFtbx, a Fortran library for handling CIF data. He is also a member of the NeXus International Advisory Committee (NIAC). (hide | hide all)12:20 Research data management and UK funding policies - S. HodsonAbstract | Presentation![]()
Simon Hodson is Executive Director of CODATA http://www.codata.org, an organisation whose mission is to strengthen international science for the benefit of society by promoting improved scientific and technical data management and use. He also sits on the Board of Directors of the Dryad data repository http://datadryad.org, a not-for-profit initiative to make the data underlying scientific publications discoverable, freely reusable, and citable. From 2009 to 2013, as Programme Manager, he led two successive phases of Jisc's innovative Managing Research Data programme http://researchdata.jiscinvolve.org/wp/. (hide | hide all)III. The integrity of published information13:45 Publication of small-unit-cell structures in Acta Crystallographica - M.A. HoylandAbstract | Presentation![]()
Michael A. Hoyland is a Systems Developer at the International Union of Crystallography, Chester, UK. He maintains the checkCIF web service for validating small-unit-cell structures, and is lead developer of the submission and review system for authors of IUCr journals. (hide | hide all)14:10 Validating a small-unit-cell structure; understanding checkCIF reports - A. LindenAbstract | Presentation![]()
For this to work well, validation protocols have to keep up with advances in structure determination methodologies. Consequently, it is important that CIF tools and definitions are both practical and extended and revised regularly. The purpose and output of validation also have to be understood easily by users, reviewers and journal editors, some of whom may not be expert crystallographers. Anthony Linden is a Research Group Leader at the X-ray Crystallography Facility, Institute of Organic Chemistry, University of Zurich, and Section Editor of Acta Crystallographica Section C. (hide | hide all)14:35 Writing a macromolecular structure paper with publBio - M. WeissAbstract | Presentation![]()
Manfred Weiss works at the Institute for Soft Matter and Functional Materials of the Helmholtz Zentrum Berlin, and is a Section Editor of Acta Crystallographica Section F. He has been a Co-editor of Acta Crystallographica Section D since 2002, is a member of the IUCr Commission on Crystallographic Teaching and a Consultant for the IUCr Commission on Biological Macromolecules. (hide | hide all)15:00 Deposition and validation of macromolecular structures - S. VelankarAbstract | Presentation![]()
Sameer Velankar is a team leader at the EBI, responsible for content and integration of the Protein Data Bank in Europe (PDBe) resource. He earned his PhD in Structural Biology from the Indian Institute of Science in Bangalore, India in 1997, working on protein crystallographic studies of thymidylate synthase and triose phosphate isomerase. He then joined Dale Wigley's group in Oxford for post-doctoral research on the elucidation of the mechanism of DNA helicase. He has worked with and contributed to all parts of the PDBe team's operations, from annotation of newly deposited structures to the development of advanced PDBe services. (hide | hide all)IV. Towards ever better science15:45 Data quality and the value of structural databases - C. GroomAbstract | Presentation![]()
Colin Groom is the Executive Director of the Cambridge Crystallographic Data Centre. After a career in academia in the UK and in New Zealand, Dr Groom joined Pfizer where he established the protein crystallography group in the UK. He subsequently held various computational and informatics roles in the UK and US. Following this he joined Celltech/UCB, leading investigational chemistry and computer-assisted drug design groups. (hide | hide all)16:10 Towards the semantic web of science - P. Murray-RustAbstract | Presentation![]()
Peter Murray-Rust was until 2012 Reader in Molecular Informatics at the Unilever Centre for Molecular Informatics, University of Cambridge. He is co-developer with Henry Rzepa of Chemical Markup Language (CML) and a Consultant to COMCIFS. (hide | hide all)16:35 Into the future with CIF - N. SpadacciniAbstract | Presentation
Nick Spadaccini is one of the main authors of the STAR File format, the STAR methods dictionary development language (DDLm) and the dictionary methods evaluator software dREL. (hide | hide all) |
![]() |
I. Standard information exchange formalismsBrian McMahon: A coherent information flow in crystallography ![]() (19 min 29 sec) John Westbrook: mmCIF and structural bioinformatics ![]() (24 min 30 sec) Brian Toby: pdCIF and the messy world of real data ![]() (22 min 31 sec) II. Improving the management of experimental dataSimon Coles: The data explosion and the need to manage diverse data sources in scientific research ![]() (21 min 58 sec) John Helliwell: Deposition and use of raw diffraction images ![]() (22 min 31 sec) Erica Yang: Managing research data for diverse scientific experiments ![]() (24 min 39 sec) Herbert Bernstein: Managing crystallographic data in facilities using integrated CIF, HDF5 and NeXus ![]() (24 min 14 sec) Simon Hodson: Research data management and UK funding policies ![]() (26 min 54 sec) III. The integrity of published informationMike Hoyland: Publication of small-unit-cell structures in Acta Crystallographica ![]() (18 min 38 sec) Tony Linden: Validating a small-unit-cell structure; understanding checkCIF reports ![]() (23 min 53 sec) Manfred Weiss: Writing a macromolecular structure paper with publBio ![]() (17 min 39 sec) Sameer Velankar: Deposition and validation of macromolecular structures ![]() (25 min 09 sec) IV. Towards ever better scienceColin Groom: Data quality and the value of structural databases ![]() (20 min 05 sec) Peter Murray-Rust: Towards the semantic web of science ![]() (18 min 24 sec) Nick Spadaccini: Into the future with CIF ![]() (20 min 59 sec) |
Click on a thumbnail in the right-hand column to see the recorded presentation. |
A brief history of CIF
Click here for an interactive timeline of CIF and other milestones in crystallographic information
The acronym CIF is used both for the Crystallographic Information File, the data exchange standard file format of Hall, Allen & Brown (1991), and for the Crystallographic Information Framework, a broader system of exchange protocols based on data dictionaries and relational rules expressible in different machine-readable manifestations, including, but not restricted to, Crystallographic Information File and XML.
CIF was developed by the IUCr Working Party on Crystallographic Information in an effort sponsored by the IUCr Commission on Crystallographic Data and the IUCr Commission on Journals. CIF was adopted in 1990 as a standard file structure for the archiving and distribution of crystallographic information. It is now well established and is in regular use for reporting crystal structure determinations to Acta Crystallographica and other journals. It is often cited as a model example of integrating data and textual information for data-centric scientific communication.
As a granular, structured format, CIF was well suited to the telegraphic style of structure reports required by Acta Crystallographica Section C: Crystal Structure Communications, and was immediately adopted by IUCr journals as a submission medium for this journal. It soon became the mandatory submission format to Acta C, and the mandatory format for supplementary structural data, and subsequently for structure factors, for all IUCr journals.
Automated procedures could then be developed for checking the submitted structural data. This allowed routine technical assessment of all submitted structures (although manual validation was already carried out by many conscientious Co-editors), and the number of erroneous space-group determinations and other technical errors in structure determinations declined significantly. The validation procedures were subsequently developed into the web-based checkCIF service, supported and used by other publishers and available as a general resource to the community for independent validation and assessment of the technical quality of a structure determination.
Extensions of CIF were rapidly developed to describe powder diffraction, modulated structures and electron density studies. Other extensions have followed over the years, including, recently, a description of crystallographic restraints and constraints, and, currently under review, an extension for crystallographic twinning.
An ambitious project to extend CIF to the complete description of protein structure experiments and models resulted in major enhancements to the underlying data model. The resulting mmCIF format formed the basis for the PDB database of protein structures as refactored by the Research Collaboratory for Structural Biology (RCSB) in 1999. Deposit of structure factors to the PDB became possible with the adoption of an mmCIF-based format for experimental data. The PDBx exchange format continues to build on the original mmCIF dictionary and interfaces with extensions for non-crystallographic methods and procedures in protein structure determination and characterisation (Westbrook et al., 2005). An XML format based on the same underlying data model is routinely used to maintain synchronicity between the international partners of the Worldwide Protein Data Bank.
Another important development, beginning in the late 1990s, has been the specification of imgCIF and its corresponding binary format CBF (the 'Crystallographic Binary File') for capturing and exchanging image data (Bernstein & Hammersley, 2005). This provides a common format across the diversity of detector manufacturers, and is currently under close consideration for its possible role in promoting strategies for the routine deposition of primary diffraction image data.
In 2006 the importance of CIF and the value of checkCIF were recognised by the Award for Publishing Innovation of the Association of Learned and Professional Society Publishers (ALPSP). In their report, the judges 'were impressed with the way in which CIF and checkCIF are easily accessible and have served to make critical crystallographic data more consistently reliable and accessible at all stages of the information chain, from authors, reviewers and editors through to readers and researchers. In doing so, the system takes away the donkeywork from ensuring that the results of scientific research are trustworthy without detracting from the value of human judgement in the research and publication process'.
Research has been under way in recent years to develop a new formalism within the CIF framework for specifying data definitions with greater precision, and with machine-readable methods for expressing relations between distinct data items (Spadaccini & Hall, 2012). This will allow automated generation of derivable data that are absent from a particular file, provided all the relevant parent data are present. This new formalism is not intended in the short term to replace the existing CIF format in routine practice, but it does have the potential to provide a unifying computational framework for applications requiring CIF input from different subject areas.
References
Bernstein, H. J. & Hammersley, A. P. (2005). Specification of the Crystallographic Binary File (CBF/imgCIF), in International Tables for Crystallography, Volume G: Definition and exchange of crystallographic data, S. R. Hall & B. McMahon, Editors. 2005, Springer: Dordrecht, The Netherlands. pp. 37-43.
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a New Standard Archive File for Crystallography. Acta Cryst. A47, 655-685.
Spadaccini, N. & Hall, S. R. (2012). DDLm: a new dictionary definition language. J. Chem. Inf. Model. 52, 1907--1916.
Westbrook, J., Yang,, H., Feng, Z. & Berman, H. M. (2005). The use of mmCIF architecture for PDB data management, in International Tables for Crystallography, Volume G: Definition and exchange of crystallographic data, S. R. Hall & B. McMahon, Editors. 2005, Springer: Dordrecht, The Netherlands. pp. 539-543.
Date and venue
Sunday 25 August 2013 09:00-17:00
Room S0.21 (Social Sciences Building)
University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK.


Registration
There is no charge for attending the Symposium. However, places are limited and must be pre-booked. Please register using the ECM 28 online registration form.
Travel
Based on information on the University of Warwick website, which also has additional information.
Air
Nearest international airports are:- Birmingham International Airport (BHX): to get to the University from Birmingham airport either: (cheapest) take the train to Coventry (£3.50 return, 10 mins) and then local bus (£1.60 single, 25 mins; exact fare must be paid to driver) or taxi (about £10, 15 mins) to the University of Warwick; or (possibly slightly quicker, but expensive - about £30 for the half-hour ride) take a taxi from the airport direct to the University of Warwick
- London Heathrow (LHR) has many more flights, and is about 2 hours away from Coventry either by bus (National Express) or by rail (via London). The bus option is simpler, and cheaper. Departures from Heathrow Airport on 21 August at 07:15, 08:30, 09:45, 11:00, 12:15, 13:30.
- Manchester (MAN): rail connections from Manchester Airport to Coventry (usually 2 or 3 per hour). Journay time 2h 15m - 2h 45m; cost about £16-£50.
Rail
Nearest main-line railway station is Coventry, served directly from London Euston, Birmingham New Street (on average every 10 minutes during the day), Birmingham International (serving the airport, on average every 10 minutes during the day), Oxford, Leicester.
Coach
If travelling by coach, the most convenient major centre is Coventry which is well served by national links. From Coventry, there are frequent local bus services to the University. Passengers may be able to alight a coach at the University central campus bus stop; however this needs to be booked 24 hours in advance of travel and not all services offer this option. Details of coach travel can be obtained from . Megabus also offer a coach service to Coventry from a selection of locations around the United Kingdom, including London. A Megabus coach stop is located at Cannon Park Shopping Centre, a 5-10 minute walk from central campus.
Local bus
There are regular bus services to the University campus from Coventry City Centre and Coventry Rail Station with the journey taking approximately 30 minutes.
- National Express Coventry
service number 12 runs from the city centre bus station (Pool Meadow) via Coventry Rail Station to the University Central Campus. Service number 11 follows the same route but continues on from campus to Leamington Spa, via Kenilworth. Number 12 & 11 timetable
- The Travel De Courcey
360a/360c route runs in a loop from and back to the University Hospital via Arena Tesco, Eastern Green, University of Warwick, Cannon Park, Cheylesmore, Jaguar Land Rover, Willenhall and Clifford Bridge Road. Number 360a/360c timetable
- Stagecoach
also provides a service that passes the University. The Stagecoach service X17 runs from the city centre, passing the junction of Gibbet Hill Road and Kenilworth Road. The entrance to the Gibbet Hill campus is about 5 minutes walk from this bus stop. After reaching campus this service goes on to Kenilworth, Leamington Spa, Warwick Hospital and Warwick Town Centre. X17 timetable
- If you are travelling from Leamington Spa, Stagecoach provides a dedicated frequent bus service, the Unibus (U1) that passes through campus. There is a second version of this route that passes through Kenilworth and is named the U2. For information on this service, please visit the Stagecoach
website. This service also covers Coventry City Centre and Coventry Rail Station on Sundays and Public Bank Holidays. Unibus (U1 & U2) timetable
- The Travel Coventry service number 11 also connects Leamington Spa and Kenilworth with the university campus and Coventry. Number 11 timetable
Bus Travel tips
- Travel Coventry buses do not usually give change so make sure you have the correct money available before making your journey. You can find information about fares on the National Express Coventry website but it is best to check with the driver when you board the bus.
- Stagecoach buses do give change on the majority of their services.
- From Coventry Rail Station visitors should follow the signs to Warwick Road (a 2 minute walk) and from there catch the Travel Coventry services 11, 12 or U17 (Sundays only), which travel onto the main campus - see above.
Road
- From the North: From the M69/M6 interchange (M6 Jct 2) take A46 towards Warwick and Coventry S & E. After approximately 3.5 miles you will reach Tollbar End roundabout (junction with A45). At the roundabout, follow signs for A45 Birmingham. After approximately 3 miles you will cross the A429 (Kenilworth Road); half a mile after this junction take the left-hand turn signposted 'University of Warwick'. Follow signs for University of Warwick (and Warwick Arts Centre) across two roundabouts. You are now approaching the University of Warwick from Kirby Corner Road.
- From the South East: From M45 Jct 1 take A45 towards Coventry. After approximately 7 miles you will reach Tollbar End roundabout (junction with A46) follow signs for A45 Birmingham. Now follow directions as for arriving from the North.
- From the South: From M40 Jct 15 take A46 towards Coventry. After approximately 8 miles leave A46 at junction signposted 'University of Warwick and Stoneleigh'. After a further 1.5 miles you will cross the A429 (Kenilworth Road). You are now approaching the University of Warwick from Gibbet Hill Road.
- From the West: From M42 Jct 6 take A45 towards Coventry. After approximately 9 miles you will pass a large Sainsbury store on your left. At the next roundabout (Fire Station on right) take the right-hand exit, signposted 'University and Canley'. Follow signs for University of Warwick (and Warwick Arts Centre) across two roundabouts. You are now approaching the University of Warwick from Kirby Corner Road.