[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Accent escape sequences
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Accent escape sequences
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Fri, 2 Mar 2007 08:43:06 -0500 (EST)
- In-Reply-To: <20070302101147.GA26353@emerald.iucr.org>
- References: <45E72969.1090100@niehs.nih.gov><20070302101147.GA26353@emerald.iucr.org>
Dear Brian, While it may look cumbersome, I would highly recommend use of MIME headers in a manner consistent with imgCIF, rather than an array of more magic numbers. I just had to add dome proposed new information to the MIME headers for imgCIF to provide hooks for use by XDS and it proved rather easy to had to the exisiting framework in CBFlib, while more magic numbers would require a new and different code structure. Coding convenience aside, I would strongly suggest support of XML, XHTML, HTML conventions, since they are well documented and explicit support for TeX, since it is essential to the publication process. I do reviews for Math Reviews ans they have nice compromises on handling snippets of TeX for reviews. You may wish to consult with them on good externals. Their system works. I hope we can avoid RTF -- it is lots and lots of undocumented gotchas. I have to use it for old style RasMol windows help files, and there is a long learning curve for my students. In any case, I think this would be a very good subject to discuss. Will people be at ACA 2007 or BSR 2007? Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Fri, 2 Mar 2007, Brian McMahon wrote: > Dear Joe > > We have recently exchanged a few messages off-list, and it is > clear that you have an interest in, and perhaps some time for, > working on CIF-based applications. It would be great if you would > introduce yourself to the list with a brief indication of your > current interests. > > Regarding the untidy typographic markup conventions in CIF text > fields, what we currently have arises from the pragmatic > requirements of our early 1991 (prehistoric!) CIF-handling > procedures in Acta Cryst. We used TeX as a formatter, so > the markup (initially) was somewhat TeX-like; but there was > pressure on us not to rely on TeX, especially as many of our > authors would have no experience of it. Thus a minimal set > of markup was devised, requiring very little learning from > authors, that covered most markup that in practice we came > across in Acta C papers (which have rather little > mathematical content). Very few additional codes were > introduced; and, for example, the relatively recent <i> and > <b> markup for italic and bold was chosen because > non-specialist authors were beginning to become familiar > with such codes in HTML markup. > > The current arrangement is, in my opinion, very inelegant, > but it is supported by publCIF, the IUCr's own CIF editor, > and is workable within that tool's reasonably user-friendly > interface. > > To provide better formatting abilities, I think it would be > preferable to allow text fields to contain markup in various > different standard formats, suitably identified, and to > pass the fields to appropriate handlers. The simplest way to > do so would be to have a 'magic number' introducing each text > field. There's an undocumented example of this inasmuch as > ciftex, the old cif->TeX translater, passes through unchanged > any text field beginning > ;%T (i.e. it treats is as containing pure TeX markup). > The 'magic number' might be a simple character sequence > (%T for TeX, %L for LaTeX, %H html, %R RTF, %U Unicode...) > or could be a more general, but more verbose, signature > involving MIME headers: > ; > Content-Type: application/tex > (this mimics the approach for embedding binary data in imgCIF files). > > There's nothing fundamentally wrong with extending the existing > special character sequences, and I'm happy to consider a > specific proposal in terms of whether we could easily provide > publCIF support for it. The problem is that the more one offers > to the author, the more the author will want to do, and the more > unwieldy an ad-hoc markup will become. (And recall that even > TeX, which is unparalleled for mathematics, does not offer as > primitives anywhere near all the symbols that our authors do > use.) > > I should be interested in hearing other COMCIFS' members thoughts > on this. > > Brian > > On Thu, Mar 01, 2007 at 02:28:41PM -0500, Joe Krahn wrote: > > It seems that there is no way to escape a single quote followed by a > > space. I was looking at the accent escape sequences and realize that it > > would be useful if these trigraphs were allowed to use a space as the > > 'letter' being modified. For example: > > > > "\' " becomes "'" > > "\~ " becomes "~" > > "\^ " becomes "^" > > "\% " becomes the degree symbol > > > > Currently, there is no carat escape to avoid superscripts, and the > > current tilde escape is only listed as "accepted by convention". > > > > If you generalize the sequence <backslash><non-alphabetic><character> to > > function like an old double-strike sequence, you can get other useful > > combinations as well, for example "\/=" becomes not-equals. > > > > I suspect that these trigraphs have not become better defined because > > most people would rather just switch to some other modern encoding. But, > > as an archival format, we are somewhat stuck with the current scheme, > > and it probably makes sense to keep things in plain ASCII, and > > human-readable. Also, I found another set of similar trigraph > > definitions that are much more extensive at the bottom of the following > > page: > > > > http://abc.sourceforge.net/standard/abc2-draft.html > > > > It is probably good to define a complete list of allowed trigraphs and > > other codes, and do away with "accepted by convention" as a separate > > list. I also think that it is worth extending the trigraphs to a more > > complete set. > > > > I am willing to try to make such a list if it is deemed useful, but > > there are some things I already don't understand from the current set: > > > > What is the purpose of \\rangle and \\langle; are these different from > > "<" and ">"? > > > > Why not use a more symbolic form for some items, like "\<-" instead of > > "\\leftarrow" > > > > Why do double and triple bond codes have names, and single bond is just > > "---"? > > > > Joe Krahn > > _______________________________________________ > > comcifs mailing list > > comcifs@iucr.org > > http://scripts.iucr.org/mailman/listinfo/comcifs > _______________________________________________ > comcifs mailing list > comcifs@iucr.org > http://scripts.iucr.org/mailman/listinfo/comcifs >
Reply to: [list | sender only]
- Follow-Ups:
- Re: Accent escape sequences (Joe Krahn)
- References:
- Accent escape sequences (Joe Krahn)
- Re: Accent escape sequences (Brian McMahon)
- Prev by Date: Re: Accent escape sequences
- Next by Date: Re: Accent escape sequences
- Prev by thread: Re: Accent escape sequences
- Next by thread: Re: Accent escape sequences
- Index(es):