[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Distributed dictionaries
- Subject: Distributed dictionaries
- From: Brian McMahon <bm@xxxxxxxx>
- Date: Thu, 20 Jul 2000 15:33:49 +0100 (BST)
As I mentioned when announcing this list, > Although each CIF application uses the same basic file syntax, content is > determined by external files (known as data dictionaries) that specify the > relevant data model and carry information about data typing, validity and > interrelationships. While customised applications can be written to utilise > specific sets of subsets of CIF data, the most powerful applications are > dictionary-driven, so that they can use or be easily modified to use new > sets of data items as they are released. In working towards the ideal of dictionary-driven applications, and in recognition of the diversification of subject-specific dictionaries, COMCIFS some time ago identified a need to establish a protocol that would allow dictionaries to be identified, located and layered. The goal really is to have a single distributed "virtual dictionary" that can be contributed to by separate dictionary author groups, and even (in effect) modified locally. I append below a protocol for dictionary maintenance that has been formally approved by COMCIFS. Before posting this publicly on the general CIF pages, I invite comment and criticism from this more specialised group of CIF software developers. I am particularly interested in any enthusiastic volunteer efforts to construct the standalone tools described in the report as reference implementations of the ideas suggested here - it's something I would like to tackle, but haven't time at the moment. Regards Brian ------------------------------------------------------------------------------ COMCIFS Working Group on Dictionary Maintenance =============================================== Brian McMahon, John Westbrook, Herbert Bernstein Final report 15 January 2000 Introduction ------------ As the number of formal and informal uses of STAR and CIF grows, it becomes increasingly important to have a clear understanding of what defines a valid CIF data set. Particularly as cross-disciplinary investigations become more common, it becomes essential to understand how to merge CIFs from multiple disciplines, and to ensure efficient and reliable transitions among versions of dictionaries. Here we propose some mechanisms to ensure coherent interactions among evolving definitions of CIFs for various domains of science. Whether being done by an individual author or by the IUCr through COMCIFS, the ground rules for extending the language are the same: Once a definition has been given for a tag, then all CIFs written using that tag must have the original meaning preserved forever. This does not mean that the definition of a tag may not be changed. Errors can and should be corrected, and definitions extended, but all of this must be done in an upwards compatible way - once CIFs have been created using a tag defined in a certain way, it would cause endless confusion if years later, when those CIFs were validated against a later dictionary, the meaning of what had been written in the past were to be changed. If a new concept is to be expressed, don't recycle an old tag, create a new one. If new tags are to be defined, they must be presented in a dictionary conforming to the rules of the DDL appropriate to the domain involved. Problems arise when multiple groups wish to make extensions with similar tags. COMCIFS will resolve conflicts involving official dictionaries, but a standard protocol is needed to minimize conflicts among local extensions made in different laboratories. We give rules for concatenation, inclusion or overlay of one or more dictionary files or, possibly, fragments of data files to build a notional "virtual dictionary". The dictionary structures built in memory by a validation application may not be exact images of this notional dictionary file, but it is a convenient model for discussing how to assemble dictionary fragments. Six components of dictionary maintenance are identified and discussed below. 1. Identify dictionaries 2. Locate dictionaries 3. Reserve namespaces 4. Add to official dictionaries 5. Merge dictionaries 6. Versioning tools A summary of the major recommendations arising from the discussion is given in section 7. 1. Identification of dictionaries --------------------------------- If one is to validate the data names appearing in a CIF data file, one must know what dictionary it is appropriate to validate against. The process is bidirectional; the CIF must identify the dictionary or dictionaries against which it is claimed to be compliant; and when those dictionaries have been retrieved, there must be internal confirmation that they are the files required. These issues are addressed in reverse order below. Note that an application seeking to validate a data file should not consider the file invalid if a data name is found that has no definition in the dictionaries referenced. The CIF standard permits the incorporation of local and standard names in any data file. Nevertheless, it is recommended as good practice that all data names in a CIF should be able to be validated against dictionary files, including locally constructed dictionaries. (a) internal identification --------------------------- For DDL1 dictionaries, the dictionaries are identified internally by three fields; thus for the current working version of the core: _dictionary_name cif_core.dic _dictionary_version 2.1 _dictionary_update 1999-03-24 Each unique edition of a dictionary should have a unique triad of values for these names. Each release version should have a different value of _dictionary_version (minor revisions and bug fixes will take the form 2.1, 2.1.1 etc.). The structure of _dictionary_name is <star-type>_<application>.dic This convention will be enforced (by COMCIFS) on new dictionaries. The original published core has instead: _compliance 'CIF Dictionary (Core 1991)' The implicit values for this file are _dictionary_name cif_core.dic _dictionary_version 1.0 _dictionary_update 1991-09-20 For DDL2, the equivalent identifiers are _dictionary.title, _dictionary.version; the revision history is itemised with the data names _dictionary_history.version, _dictionary_history.update, and _dictionary_history.revision As matters stand at present in DDL2 dictionaries (e.g. mmCIF), the dictionary identification and history items are set in the single data block that has scope over the entire file (this differs from practice in DDL1 dictionaries, where each definition is in a separate data block). It is proposed that the dictionary information be stored in its own save frame in future revisions of DDL2 dictionaries, so that dictionary naming and versioning can be handled at an application level when dictionaries are merged (as discussed later). (b) compliance certification ---------------------------- Data files should certify the dictionaries against which they may be validated with data names from the AUDIT_CONFORM category, e.g. _audit_conform_dict_name cif_core.dic _audit_conform_dict_version 2.0 [for DDL2 files, _audit_conform.dict_name and _audit_conform.dict_version]. These data names may be looped when there are multiple dictionaries. Ideally, the various _audit_ items (in categories such as AUDIT, AUDIT_CONFORM, AUDIT_LINK, AUDIT_AUTHOR and AUDIT_CONTACT_AUTHOR) should relate only to the creation and update of the current data block. An attempt to encourage this has been made in the wording of the definitions in the current core and mmCIF dictionaries. We should enforce the rule that _audit_ items refer only to the current data block, so that a file containing multiple data blocks must repeat its _audit_ lists in each data block. 2. Location of dictionaries --------------------------- The core data name _audit_conform_dict_location [_audit_conform.dict_location] permits a data file to specify a filename or URL identifying the location of the dictionary file referred to by _audit_conform_dict_name and *_version. However, it is recommended that this data name be used only to locate local dictionaries. To guard against changing file names or URLs, and to grant access to the most current version of any dictionary, public dictionaries are best located using a public registry. It is proposed that COMCIFS maintain such a registry. An example of the contents of such a registry (expressed of course as a STAR File) is given in Appendix 1. Note how the example permits dictionaries local to a particular application or laboratory to be made publicly available if so required. It is possible to reference both public and local dictionaries: a value of '.' for _audit_conform_dict_location implies that the location should be obtained from the public registry information; e.g. loop_ _audit_conform_dict_name _audit_conform_dict_version _audit_conform_dict_location cif_core.dic 2.1 . cif_pd.dic 1.0 . cif_my.dic 1.0 /usr/local/dics/my_local_dictionary (In this example the dictionary file location is given as an absolute path name, and hence is valid on only one machine or network.) In detail, the recommendations for locating and loading dictionaries are as follows. (a) COMCIFS shall construct and maintain a registry of known dictionaries. The values that must have a unique value for each entry are the combination of _dic.name and _dic.version. Local dictionaries appropriate to local data names with a reserved prefix "foo" may be assigned the identifier cif_local_foo.dic, and their locations may be registered as publicly accessible URLs if their maintainers are willing to allow them to be visible to external validation software. This file will be maintained by COMCIFS at a public and stable URL. (b) Dictionary validation software source shall be distributed with a copy of the most recent version of the above file, and with the URL of the master copy hard coded. Library utilities should be provided that permit local cacheing of the registry file, and the ability to download and replace the cached register at periodic intervals. Individual dictionary files located through the use of the registry should also be cached locally. (c) Each data file should contain a reference to one or more dictionary files against which the file should be validated. This will comprise minimally _audit_conform_dict_name [_audit_conform.dict_name for DDL2 files] (N), and optionally *_version (V) and *_location (L). In the event that no dictionaries are specified, the default validation dictionary should be that identified as having N="cif_core.dic" and V="." (i.e. the most recent version of the core dictionary). Since dictionaries are intended always to be extended, it normally suffices to specify only the name (and possibly location). (d) A dictionary validation application should try to load the referenced validation dictionaries according to the following protocol. If N, V and L are all given, try to load the file found at location L, or a locally cached copy of the file thus referenced. If this fails, raise a warning. Then search the dictionary registry for entries matching the given N and V. If successful, try to load the file at the location given by the matching entry (or a locally cached copy). If this fails, try to load files identified from the registry with the same N but progressively older versions V (version numbering takes the form n.m.l..., where n, m, l, ... are integers referring to progressively less significant revision levels). Version '.' should be accessed prior to any other numbered version. If this attempt fails, raise a warning indicating that the desired dictionary could not be located. If N and V are given, try to load locally cached or master copies of the files with location given in the registry file, in the order stated above, viz: - the version number V specified - the version with version number indicated as '.' - progressively older versions Success in other than the first instance should be accompanied by a warning message specifying the revision actually loaded. If only N is given, try to load files identified in the registry by - the version with version number indicated as '.' - progressively older versions If all efforts to load a referenced dictionary fail, the validation application should raise a warning. If all efforts to load all referenced dictionaries fail, the validation application should raise an error. (e) For any dictionary file successfully loaded according to the protocol stipulated in (d), the validation application must scan the file for internal identifiers (_dictionary_name, _dictionary_version) and ensure that they match the values of N and V (where V is not "."). Failure in matching should raise an error. 3. Reserved namespaces ---------------------- For official CIF dictionaries it may be taken as a responsibility of COMCIFS to check at the approval stage that no data name clashes with any currently registered in the existing official dictionaries. For local dictionaries (files defining additional data names), a registry of reserved prefixes is maintained by COMCIFS. See http://www.iucr.org/iucr-top/cif/spec/reserved.html for a proposed documentation of the conventions and procedures involved. The reserved prefix is simply an underscore-bounded string within the data name: for DDL1 applications it must be the first such component of the data name; for DDL2 applications the first component of the data name if describing data names in a category not defined in the official dictionaries; or the first component after the period character (category delimiter) if the local data name is an extension to an existing category. The registry of reserved prefixes maintained as described in http://www.iucr.org/iucr-top/cif/spec/reserved.html should include the additional convention that a local dictionary defining data names with reserved prefix foo will be identified by the dictionary name cif_local_foo.dic The character string "[local]" shall not be used in any COMCIFS-approved dictionary-defined tag, so that tags constructed with "[local]" in any position will never conflict with tags in any official dictionary. This can facilitate development work with tags that might in future be offered as new public ones. It would be prudent for any group planning any interchange of CIFs with such tags to combine the "[local]" construct with a properly registered prefix, such as "_foo" to make "_foo_[local]"; but for purely local work involving only official dictionaries combined with local efforts, only "[local]" somewhere within a tag would be necessary to avoid conflicts. In order to resolve conflicts among local dictionaries and to facilitate mapping the namespaces of local dictionaries into the namespace of official dictionaries when particular local dictionaries achieve community acceptance, a "cifmap" program is proposed which maps tags in both CIFs and dictionaries to new namespaces, either tag-by-tag or in blocks. In order to preserve the integrity of CIFs using the unmapped dictionary, the remapped dictionary should not be given the same name as the original. The mapped dictionary is not a version of the original dictionary. It is a different dictionary The exact syntax of the commands to cifmap should be discussed and made as user-friendly as possible. Since our context is CIF, it seems natural to make the mapping into a loop, e.g.: loop_ _cifmap_map.from _cifmap_map.to _cifmap_map.regex _cifmap_map.comments _cifmap_map.quoted_strings '_some_original_tag' '_some_target_tag' no yes yes '*_xyz_\[local\]' '${1}_qrs_\[local\]' yes yes no '*[.|_]abc_\[local\]* '${1}${2}defghi_${3}' yes no yes 4. Addition to official dictionaries ------------------------------------ As the official dictionaries become larger and more complex, it is useful to have a mechanism allowing additions to be listed, tested and modified in an orderly fashion before full integration with the official dictionary. An administrative mechanism has been established for the mmCIF dictionary, and can serve as a model for other dictionaries. The mmCIF approach is as follows. An editorial board has been appointed, to which new data items may be submitted. The board members have responsibility for the scientific content of the new items; they work alongside technical editors responsible for ensuring that the new definitions are compliant with the dictionary syntax. Proposed extensions to the dictionary are submitted to the mmCIF web server in the form of a dictionary entry. To facilitate this, a definition template is available on the mmCIF server via the URL http://ndbserver.rutgers.edu/NDB/mmcif/dict-templates/index.html Proposed definitions receive a preliminary review by the editorial staff and are then sent to the appropriate board member(s) for scientific content review. The reviewers receive full dictionary definitions and return them with corrections and/or modifications. Once definitions have been approved they are put into a provisional extensions dictionary. If the proposed data item was submitted with a local data name prefix, that prefix is removed in the entry in the extensions dictionary, but an alias to the local dictionary name (with the extension) is retained. A policy for retention times for these aliases is under discussion within the mmCIF community. The provisional dictionary is made available on the mmCIF server and updated on a continuing basis. Members of the community at large are encouraged to monitor the evolution of the extensions dictionary via the mmCIF web page, and to comment on all new data items via the mmCIF list server. When the extensions dictionary has been formally approved by COMCIFS, the new data items are incorporated into the parent mmCIF dictionary. It is recommended that other dictionary maintenance groups study this model and consider offering it within their own communities. A set of definition templates should be constructed for DDL1 dictionaries. 5. Merging dictionaries ----------------------- Suppose that one wishes to validate a data file against the core dictionary, except that one wishes to modify the enumeration range of one or more data items. (For example, the core dictionary permits _atom_site_attached_hydrogens an enumeration range of 0 to 8; but one might wish to validate well-behaved organic molecules where anything above 4 almost certainly represents a mistake.) It would seem laborious to create an alternative dictionary of the same size as the core simply for this one change. Here is a suggested protocol for extending, replacing and merging dictionary definition fragments. (The terminology "dictionary fragment" refers here to a physical file which contains one or more data blocks or save frames containing complete or partial sets of attributes associated with data names that are identified in the relevant dictionary data block or save frame through the item "_name" or "_item.name".) (a) Assemble and load all dictionary fragments against which the current data block will be validated. The order of presentation is important. Dictionary fragments should be assembled in the order cited by a data file. A dictionary validation application may accept a list of additional dictionary fragments to PREPEND to, REPLACE, or APPEND to the list referenced internally. (b) Define three modes in which name collisions in the aggregate dictionary file may be resolved, called STRICT, REPLACE and OVERLAY. (c) Scan the aggregate dictionary files in the order of loading. Assemble for each data name a composite definition frame thus, depending on the mode in which the validation application is working: STRICT: If a data name appears to be multiply defined, generate a fatal error REPLACE: All attributes previously stored for the conflicting data name are expunged, and only the attributes in the later data block (or save frame) containing the definition are preserved OVERLAY: New attributes are added to those already stored for the data name; conflicting attributes replace those already stored Appendix 2 describes the process in more detail for a DDL1.4 dictionary, and Appendix 3 gives some examples. This protocol permits the creation of a coherent virtual dictionary from several different dictionary files or fragments. Although it must be used with care, it permits different levels of validation based on dictionary-driven methods, without modifying the original dictionary files themselves. 6. Versioning tools ------------------- Though we all desire dictionary stability in order to avoid confusion and to keep software from breaking, change is a constant of the progress of science. New dictionaries for new applications of CIF will be needed and new versions of existing dictionaries will be needed. Further, retrieval of a specific previous version may be required to resolve anomalies or difficulties in data file validation. When there was just a core dictionary changing every few years, almost any mechanism of managing such changes would work. As we enter a period of multiple dictionaries and rapid rates of change, a rigorous approach to revision control will help to ensure a stability of use even while the dictionaries themselves change. A single application may need to work with a virtual dictionary created from multiple official dictionaries and one or more local dictionaries by layering the definitions from each dictionary onto those of the others. A general long-term implementation of this approach may require extensions to the DDLs to allow precise specification of dictionary deletions and insertions, including changes to comments and of lines within text fields, and a merging of virtual dictionary management software into a variant of the popular Unix revision control system, rcs, with some improvements to provide a simple CIF-editor interface and better protection against file corruption than rcs provides. The resulting software suite could well be called cifrcs. 7. Recommendations ------------------ (1) A registry of reserved prefixes should be maintained by COMCIFS to permit software developers exclusive use of a reserved namespace. [Comment: this is already in place at the URL published in the main body of the report.] (2) COMCIFS should maintain a registry of known dictionaries mapping dictionary identifiers against real URLs. (3) Dictionary files for validation should be located and loaded with reference to the registry. (4) The data names _audit_conform_dict_name, *_version and *_location (or their DDL2 equivalents) should be included in each data block of a data file. (5) A virtual dictionary may be constructed by prepending, replacing or appending external dictionary files to the list specified in the _audit_conform_ items of a data block. (6) The protocol for merging dictionaries as outlined in section 5 of this report should be approved and published. (7) Mechanisms should be established by dictionary maintenance groups to permit suggestions for new data names to be received from the public, reviewed and incorporated into subsequent revisions of public dictionaries. [Comment: the mmCIF model is commended but may not be appropriate for all communities.] APPENDIX 1. An example dictionary registry file ----------------------------------------------- data_validation_dictionaries loop_ _dic.name _dic.version _dic.DDL_compliance _dic.reserved_prefix _dic.URL _dic.description _dic.maintainer_name _dic.maintainer_email cif_core.dic . 1.4 . ftp://ftp.iucr.org/pub/cifdics/cif_core.dic 'Core CIF Dictionary' 'B. McMahon' bm@iucr.org cif_core.dic 1.0 1.4 . ftp://ftp.iucr.org/pub/cifdics/cifdic.C91 'Original Core CIF Dictionary' 'B. McMahon' bm@iucr.org cif_core.dic 2.0.1 1.4 . ftp://ftp.iucr.org/pub/cifdics/cif_core_2.0.1.dic 'Core CIF Dictionary' 'B. McMahon' bm@iucr.org cif_core.dic 2.1 1.4 . ftp://ftp.iucr.org/pub/cifdics/cif_core_2.1.dic 'Core CIF Dictionary' 'B. McMahon' bm@iucr.org cif_pd.dic 1.0 1.4 . ftp://ftp.iucr.org/pub/cifdics/cif_pd_1.0.dic 'Powder CIF Dictionary' 'B.H. Toby' Brian.Toby@NIST.GOV cif_mm.dic 1.0 2.1.2 . ftp://ftp.iucr.org/pub/cifdics/cif_mm_1.0.dic 'Macromolecular CIF Dictionary' 'P.M.D.F. Fitzgerald' paula_fitzgerald@merck.com cif_local_iucr.dic 1.0 1.4 iucr ftp://ftp.iucr.org/pub/cifdics/cifdic.iucr 'IUCr journal use' 'B. McMahon' bm@iucr.org cif_local_xtal.dic 1.0 1.4 xtal ftp://ftp.crystal.uwa.edu.au/pub/cifdic.xtal 'Xtal program system' 'S.R. Hall' syd@crystal.uwa.edu.au cif_local_shelx.dic 1.0 1.4 shelx . 'SHELX solution and refinement programs' 'G.M. Sheldrick' gsheldr@shelx.uni-ac.gwdg.de cif_local_gsas.dic 1.0 1.4 gsas . 'GSAS powder refinement system' 'A. Larson' . cif_local_cgraph.dic 1.0 1.4 cgraph ftp://ftp.OxfordCryosystems.co.uk/foo.bar 'Oxford Cryosystems Crystallographica package' 'A. Renshaw' alex@OxfordCryosystems.co.uk cif_local_ccdc.dic 1.0 1.4 ccdc ftp://ftp.ccdc.cam.ac.uk/foo.bar 'Cambridge Crystallographic Data Centre' 'F.H. Allen' allen@ccdc.cam.ac.uk APPENDIX 2: Merging or overlay of dictionaries with respect to DDL 1.4 ---------------------------------------------------------------------- [Imagine a hypothetical dictionary merge operation cifdiccreate -mode OVERLAY a.dic b.dic > c.dic] (1) a.dic and b.dic should each have at most one datablock containing the data names _dictionary_name and _dictionary_version (with, optionally, _dictionary_update and _dictionary_history). The *_name and *_version together identify uniquely the dictionary file, and should match corresponding entries in the IUCr registry if this is a public dictionary. This information is conventionally stored in a datablock named data_on_this_dictionary. In DDL1.4, all four of _dictionary_name, *_version, *_update and *_history are scalars, i.e. may not be looped. So a possible protocol for constructing the new dictionary identifier section in the resultant c.dic is the following: 1. Create a datablock data_on_this_dictionary at the top of c.dic. 2. If a dictionary name is supplied (via command-line switch "-dname" for example) write this as the value of _dictionary_name; otherwise generate a pseudo-unique string (e.g. concatenate the hostid, pid and current date string on a Unix system). 3. If a dictionary version number is supplied (via command-line switch "-dversion" for example) write this as the value of _dictionary_version; otherwise supply the value "1.0". 4. Supply the current date in the format yyyy-mm-dd as the value of _dictionary_update. 5. Create a composite _dictionary_history by concatenation of the individual _dictionary_history fragments. (2) COMCIFS has currently undertaken not to use STAR global_ constructs in CIF data dictionaries. However, there is a global_ section in the DDL1.4 dictionary, and we should perhaps plan for future instances. For our purposes, we presume that each datablock within the primitive file following a global_ statement implements the global assignments according to the standard STAR rules. That is, in the case of DDL1.4 itself, the global_ statement "_list no" is implemented in each successive datablock unless that datablock already contains a different "_list" statement. The global assignment does not extend across file boundaries (i.e. in our example, a global_ in a.dic is expanded only to the end of the file a.dic, and not applied to b.dic). All global_'s are expanded in the merge process and no global_ statement is written to c.dic. (3) There is no deep significance to the ordering of datablocks (containing definitions) in dictionaries, though they are conventionally sorted alphabetically. For convenience, datablocks should be written out in the order in which they are encountered in the input primitive dictionary files, except that definitions modified by subsequent entries remain in their initial location. (4) We propose the following procedure. Load a datablock from the first dictionary file. Locate the '_name' tag. (Because _name may be looped, there may be more than one value. For now, we assume there's a single value of _name.) Search the next dictionary file for a datablock containing the same value of '_name'. Load the contents of that datablock. (a) If the new datablock contains only data items that do not appear in the first datablock, they are simply concatenated with those already present. (b) If the new datablock contains a scalar data item already present in the first datablock (i.e. with "_list no") discard the stored attributes. (c) If the new datablock contains data items that may be looped and that occur in the first datablock, build a new composite table of values in this way: (i) construct a valid loop header if necessary (ii) do not repeat identical sets of values (i.e. collapse identical table rows) (iii) if it is possible to identify the category key, then issue a fatal warning and die if there are identical instances of a key value (after the normalisation of step (ii) has occurred) (iv) else append new rows to the table When the new composite datablock has been built according to these principles, search the next dictionary file specified and repeat. APPENDIX 3: Examples -------------------- We consider how a hypothetical validation program "dictcheck" might validate a data file against a range of local validation dictionaries. Here is an example: a data file test.cif includes the fragment _audit_conform_dict_name 'official' _dummy 1234.5 The entry for "_dummy" in the dictionary "official" looks like this: data_dummy _name '_dummy' _type numb _enumeration_range 0: # i.e. any positive number A local validation dictionary dict_A has entry data_dummy_modified _name '_dummy' _enumeration_range 0:1000 A local validation dictionary dict_B has entry data_dummy _name '_dummy' _type_extended integer A local validation dictionary dict_C has entry data_dummy _name '_dummy' _type char Here is an analysis of some runs of the notional dictcheck application. dictcheck test.cif The data item is valid (assuming dictcheck was able to locate and load the dictionary "official". dictcheck -mode STRICT -A "dict_A" test.cif An attempt is made to define the data name "_dummy" in two dictionaries; the validation process fails. dictcheck -mode OVERLAY -A "dict_A" test.cif The value of _dummy is invalid, because the latest enumeration range restricts its value from 0 to 1000. dictcheck -mode OVERLAY -P "dict_A" test.cif The value of _dummy is valid, because dict_A is prepended to the list of dictionary fragments scanned, so that the last enumeration range stored is 0: dictcheck -mode OVERLAY -A "dict_B" test.cif An additional attribute (_type_extended, a local attribute presumed to be understood by the validation software) is overlaid on the properties of _dummy, and its value is now invalid because it is not an integer. dictcheck -mode REPLACE -A "dict_B" test.cif The _type_extended attribute is now present, but the original _type attribute has been lost (mode REPLACE expunges any stored information). I suggest that whether or not this is an error is application dependent. dictcheck -mode REPLACE -A "dict_C" test.cif The value of "_dummy" is invalid, because it is not of type char. dictcheck -mode OVERLAY -A "dict_C" test.cif Now there is an inconsistency in the "virtual dictionary" - the definition block for "_dummy" has effectively two attributes _type char _enumeration_range 0: which are incompatible, because an attempt has been made to impose a numeric range on a character value. An OVERLAY example: a.dic has the following entry: data_cell_volume _name '_cell_volume' _category cell _type numb _type_conditions esd _enumeration_range 0.0: _units A^3^ _units_detail 'cubic angstroms' _definition ; Cell volume V in angstroms cubed. ; b.dic has this datablock: data_cell_volume_additional _name '_cell_volume' _type_construct '[+-]?[1-9][0-9]*\.?[0-9]*\(([1-9]?[0-9]*)\)?' _example 123.4 These are merged to create the legitimate datablock data_cell_volume _name '_cell_volume' _category cell _type numb _type_conditions esd _enumeration_range 0.0: _units A^3^ _units_detail 'cubic angstroms' _definition ; Cell volume V in angstroms cubed. ; _type_construct '[+-]?[1-9][0-9]*\.?[0-9]*\(([1-9]?[0-9]*)\)?' _example 123.4 Now suppose we merge an additional datablock from another file, data_cell_volume_additional # same name, but OK - in a different file _name '_cell_volume' _example 4567.8 _example_detail 'large cell' The resultant looks like this: data_cell_volume _name '_cell_volume' _category cell _type numb _type_conditions esd _enumeration_range 0.0: _units A^3^ _units_detail 'cubic angstroms' _definition ; Cell volume V in angstroms cubed. ; _type_construct '[+-]?[1-9][0-9]*\.?[0-9]*\(([1-9]?[0-9]*)\)?' loop_ _example _example_detail 123.4 . 4567.8 'large cell' Now try to merge in another datablock, data_cell_volume_more _name '_cell_volume' loop_ _example _example_detail 4567.8 'large cell' 123.4 'small cell' _units A^3^ This now causes a fatal error. Note first that the first row in the "_example" table duplicates the second row in the preceding example, but this is not an error; the second occurrence of " 4567.8 'large cell'" is simply discarded. However, the next row conflicts with an existing row containing an identical key value, and this IS an error. ------------------------------------------------------------------------------
Reply to: [list | sender only]
- Prev by Date: Re: Backus-Naur descriptions for STAR and CIF
- Next by Date: Tcl routines for parsing CIF?
- Prev by thread: Tcl routines for parsing CIF?
- Next by thread: Revision of IUCr policy statement on STAR/CIF
- Index(es):