[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Accent escape sequences
- To: Discussion list of the "IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Accent escape sequences
- From: James Hester <jrh@anbf2.kek.jp>
- Date: Tue, 06 Mar 2007 15:18:32 +0900
- In-Reply-To: <20070305160044.GB13871@emerald.iucr.org>
- References: <45E72969.1090100@niehs.nih.gov><20070302101147.GA26353@emerald.iucr.org><Pine.BSF.4.58.0703020830490.46806@epsilon.pair.com><45EA0C29.5060604@niehs.nih.gov><a06230900c20fde7910a9@[192.168.2.101]><45EC3846.5070001@niehs.nih.gov><20070305160044.GB13871@emerald.iucr.org>
Thinking about the mechanics of implementing these suggestions, it would make sense to define different types of text field using the _item_type_list.code DDL2 attribute. Currently mmCIF appears to have only 'text' for multiline data, and imgCIF has 'binary' in addition to this. A new type (e.g. 'mime') could specify a regex that matches a mime header, something like what is done for the imgCIF 'binary' type. A variation on this would be to define a larger number of _item_type_list.codes corresponding to the various text formats of interest, for example 'ascii_markup','tex','html','mathml'. This would mean that the format of a given data item would be determined at dictionary writing time if a single type code is given in the dictionary. While this might work and be quite useful when writing dictionaries, it is probably too onerous when producing data files. So the data dictionary would specify a list of possible text type codes, and a magic number or mime header would be useful in the data item text field in order to disambiguate. Regarding the suggestion that there be several representations of the same text using a mime multipart approach, I think caution is warranted insofar as this might relate to dictionary data items (as opposed to data file data items), in that all of the parts should be kept synchronised, entailing more work, and work which involves specialised knowledge. On Mon, 2007-03-05 at 16:00 +0000, Brian McMahon wrote: > > The advantage of a simple escape mechanism, like the current scheme, is > > that it is fairly easy to read directly. The disadvantage is that it has > > limited abilities. With MIME, the multipart/alternative could be used, > > where simple ASCII escapes are combined with a more accurate version > > that is not directly readable. This give the advantages of both forms. > > In principle, this is a great idea. Consider the CIF dictionaries, > where the pure-text _definition field sometimes carries inventive > representations of maths (e.g. > http://www.iucr.org/iucr-top/cif/cifdic_html/1/cif_core.dic/Irefine_ls_restrained_S_gt.html ) > that have to be reverse-engineered into something more useful (e.g. TeX) > when typesetting these for International Tables. It would make it > easier to keep these representations in sync if they were both > transported as multipart/alternative content in the same text field. > > But ... this does come at the expense of significantly more > complexity in applications that need to do something with the > content of text fields. Most scientific CIF applications (the > ones that work on the data) won't be affected - they just skip > over text fields. The others will need to have the ability to > parse and extract MIME content (not too difficult), but also > to *write* proper multipart content, and that's not necessarily > so easy if you're to provide tools that ingest content from > different input streams (TeX-savvy editors, html editors, > clipboards...). In practice the Acta office doesn't see a > critical mass of content provision to justify this complexity > at this stage (it's still really only Acta C and E that use > CIF text fields extensively, and they're catered for through > publCIF). Having said which, there's no harm in working through > the details of how such a system could operate. > > Going back to Joe's original wishes to rationalise and perhaps > extend the existing CIF markup, it's important also to remember > that some data items will also occasionally require markup for > simple string fields - e.g. how to markup the "alpha" Wyckoff > position in the symmetry CIF dictionary? The use of > the '\a' digraph in > http://www.iucr.org/iucr-top/cif/cifdic_html/2/cif_sym.dic/Ispace_group_Wyckoff.letter.html > clearly derives from the "usual" CIF markup for alpha, but that is > nowhere made formally clear. It looks like we need unambiguous > markup rules in these cases too. > > (I'm hoping to see our publCIF developer later this week so that > we can discuss the specifics of the proposal Joe posted recently.) > > Brian > _______________________________________________ > comcifs mailing list > comcifs@iucr.org > http://scripts.iucr.org/mailman/listinfo/comcifs -- _______________________________________________________________________ James Hester, ANBF KEK e-mail: jrh@anbf2.kek.jp Oho 1-1 Phone: +81 298 64 7959 Tsukuba, Ibaraki 305 Fax: +81 298 64 7967 Japan ________________________________________________________________________
Reply to: [list | sender only]
- Follow-Ups:
- Re: Accent escape sequences (Joe Krahn)
- References:
- Accent escape sequences (Joe Krahn)
- Re: Accent escape sequences (Brian McMahon)
- Re: Accent escape sequences (Herbert J. Bernstein)
- Re: Accent escape sequences (Joe Krahn)
- Re: Accent escape sequences (Herbert J. Bernstein)
- Re: Accent escape sequences (Joe Krahn)
- Re: Accent escape sequences (Brian McMahon)
- Prev by Date: Re: Accent escape sequences
- Next by Date: Re: Accent escape sequences
- Prev by thread: Re: Accent escape sequences
- Next by thread: Re: Accent escape sequences
- Index(es):