[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Accent escape sequences
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Accent escape sequences
- From: Brian McMahon <bm@iucr.org>
- Date: Fri, 2 Mar 2007 10:11:47 +0000
- In-Reply-To: <45E72969.1090100@niehs.nih.gov>
- References: <45E72969.1090100@niehs.nih.gov>
Dear Joe We have recently exchanged a few messages off-list, and it is clear that you have an interest in, and perhaps some time for, working on CIF-based applications. It would be great if you would introduce yourself to the list with a brief indication of your current interests. Regarding the untidy typographic markup conventions in CIF text fields, what we currently have arises from the pragmatic requirements of our early 1991 (prehistoric!) CIF-handling procedures in Acta Cryst. We used TeX as a formatter, so the markup (initially) was somewhat TeX-like; but there was pressure on us not to rely on TeX, especially as many of our authors would have no experience of it. Thus a minimal set of markup was devised, requiring very little learning from authors, that covered most markup that in practice we came across in Acta C papers (which have rather little mathematical content). Very few additional codes were introduced; and, for example, the relatively recent <i> and <b> markup for italic and bold was chosen because non-specialist authors were beginning to become familiar with such codes in HTML markup. The current arrangement is, in my opinion, very inelegant, but it is supported by publCIF, the IUCr's own CIF editor, and is workable within that tool's reasonably user-friendly interface. To provide better formatting abilities, I think it would be preferable to allow text fields to contain markup in various different standard formats, suitably identified, and to pass the fields to appropriate handlers. The simplest way to do so would be to have a 'magic number' introducing each text field. There's an undocumented example of this inasmuch as ciftex, the old cif->TeX translater, passes through unchanged any text field beginning ;%T (i.e. it treats is as containing pure TeX markup). The 'magic number' might be a simple character sequence (%T for TeX, %L for LaTeX, %H html, %R RTF, %U Unicode...) or could be a more general, but more verbose, signature involving MIME headers: ; Content-Type: application/tex (this mimics the approach for embedding binary data in imgCIF files). There's nothing fundamentally wrong with extending the existing special character sequences, and I'm happy to consider a specific proposal in terms of whether we could easily provide publCIF support for it. The problem is that the more one offers to the author, the more the author will want to do, and the more unwieldy an ad-hoc markup will become. (And recall that even TeX, which is unparalleled for mathematics, does not offer as primitives anywhere near all the symbols that our authors do use.) I should be interested in hearing other COMCIFS' members thoughts on this. Brian On Thu, Mar 01, 2007 at 02:28:41PM -0500, Joe Krahn wrote: > It seems that there is no way to escape a single quote followed by a > space. I was looking at the accent escape sequences and realize that it > would be useful if these trigraphs were allowed to use a space as the > 'letter' being modified. For example: > > "\' " becomes "'" > "\~ " becomes "~" > "\^ " becomes "^" > "\% " becomes the degree symbol > > Currently, there is no carat escape to avoid superscripts, and the > current tilde escape is only listed as "accepted by convention". > > If you generalize the sequence <backslash><non-alphabetic><character> to > function like an old double-strike sequence, you can get other useful > combinations as well, for example "\/=" becomes not-equals. > > I suspect that these trigraphs have not become better defined because > most people would rather just switch to some other modern encoding. But, > as an archival format, we are somewhat stuck with the current scheme, > and it probably makes sense to keep things in plain ASCII, and > human-readable. Also, I found another set of similar trigraph > definitions that are much more extensive at the bottom of the following > page: > > http://abc.sourceforge.net/standard/abc2-draft.html > > It is probably good to define a complete list of allowed trigraphs and > other codes, and do away with "accepted by convention" as a separate > list. I also think that it is worth extending the trigraphs to a more > complete set. > > I am willing to try to make such a list if it is deemed useful, but > there are some things I already don't understand from the current set: > > What is the purpose of \\rangle and \\langle; are these different from > "<" and ">"? > > Why not use a more symbolic form for some items, like "\<-" instead of > "\\leftarrow" > > Why do double and triple bond codes have names, and single bond is just > "---"? > > Joe Krahn > _______________________________________________ > comcifs mailing list > comcifs@iucr.org > http://scripts.iucr.org/mailman/listinfo/comcifs
Reply to: [list | sender only]
- Follow-Ups:
- Re: Accent escape sequences (Joe Krahn)
- Re: Accent escape sequences (David Brown)
- Re: Accent escape sequences (Herbert J. Bernstein)
- References:
- Accent escape sequences (Joe Krahn)
- Prev by Date: Accent escape sequences
- Next by Date: Re: Accent escape sequences
- Prev by thread: Accent escape sequences
- Next by thread: Re: Accent escape sequences
- Index(es):