[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Items for the Agenda of the COMCIFS closed meeting
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Items for the Agenda of the COMCIFS closed meeting
- From: Brian McMahon <bm@iucr.org>
- Date: Tue, 22 Mar 2005 15:22:47 +0000
- In-Reply-To: <423F2BCB.5090700@mcmaster.ca>
- References: <423F2BCB.5090700@mcmaster.ca>
I have four comments arising from David's items for discussion. (1) Increasingly, large IT-driven projects such as the European Bioinformatics Institute are handling data transfer through application programming interface (APIs) rather than specific formats. That is, one provides programming handles into abstract data structures ("return(Molecule), get3DPosition(Molecule->Atoms)") and relies on a set of library functions or filters to map these data structures onto the format or formats of the day. In practice the data structures are initially built against a particular software implementation (probably the program most used within the domain, or that happens to be best known by the programmers). My suspicion is that there is no guarantee that these data structures will map onto any arbitrary format (but the current generation of software engineers is pretty good at forcing the best possible fit). In this context the CIF dictionaries remain essential in providing a reference data model for crystallography. I find CIF itself a good format (or language) for developing new data structures, because it is syntactically very simple and has restricted internal structures (no nested loops, for instance). Even so, the complexities of parent-child relationships are stretching our ad-hoc dictionary development procedures, and we should explore carefully whether the dictionaries should continue to be developed in CIF (with appropriate development of tools to ensure their self-consistency), or whether environments such as RDF or UML can be helpful in maintaining or at least checking the consistency of the model. To my mind, a strong argument in favour of continuing development based on CIF/STAR is that already the CIF dictionaries offer more properties of an object that can be validated against a dictionary than are found in most (if not all) XML environments. Syd's proposed DDL3/dREL will take this even further. (2) Whatever the chosen ontology development route, we will at some level need to be able to interface with XML to achieve cross-disciplinary interoperability. I believe COMCIFS should develop a canonical mapping of CIF data files to an XML data structure. The Bilbao symmetry database group has produced XML-based applications from their existing collection of symmetry data sets in CIF format, and is keen to develop more. The RCSB's mmCIF/XML mapping is already used by the worldwide Protein Data Bank, and to my mind has a strong claim as a de facto standard since it's already tested in use. Is it suitable for smaller-scale applications like the Bilbao symmetry server? We should explore whether this is a suitable mapping for universal application: it is surely desirable to avoid a proliferation of ad hoc mappings to XML. (3) I am rather against trying to develop the DDL1 model further - it accommodates pdCIF (just), and it's simple enough to be understood by programmers in languages such as Fortran whose only interest in CIF is to feed in the numbers required for crunching. Is there a case for freezing the current DDL1 dictionaries at the current revision and going over to DDL2 for any new content in the core dictionary? (Or, what may amount to the same thing, freezing the "core dictionary" at this revision, except for trivial additions like _publ_author_email, and developing new content such as molecular descriptions and extended diffraction density in DDL2 dictionaries?) (4) As matters stand, the existence of the DDL2 version of the core dictionary within mmCIF saddles the mmCIF maintainers with a heavy burden of maintenance. I am sure the RCSB would be happy to see the IUCr adopt DDL2 for small-molecule CIFs and COMCIFS take over responsibility for maintaining the DDL2 core dictionary. However, I can't see a case for the IUCr to drop existing support for DDL1-based CIFs (tools such as enCIFer are heavily DDL1/core-based, pdCIF may simply not work in DDL2). But we may be able to provide support for small-molecule structures in DDL2 as well, though it may take some effort to persuade the small-molecule community to migrate to DDL2. Best wishes Brian
Reply to: [list | sender only]
- References:
- Items for the Agenda of the COMCIFS closed meeting (David Brown)
- Prev by Date: Re: New item for the core CIF dictionary?
- Next by Date: Denver conference on Digital Scientific Libraries
- Prev by thread: Re: Items for the Agenda of the COMCIFS closed meeting
- Next by thread: RE: Items for the Agenda of the COMCIFS closed meeting
- Index(es):