With the advent of XML and its adoption of Unicode as the required
character set for all documents, most problems previously
associated with the representation of the divers languages and
writing systems of the world are greatly reduced. For those
working with standard forms of the European languages in
particular, almost no special action is needed: any XML editor
should enable you to input accented letters or other `non-ASCII'
characters directly, and they should be stored in the resulting
file in a way which is transferable directly between different
systems, whether as Unicode characters or as character entity references.
For compatability with other older systems, however, the TEI Lite
DTD includes declarations for a number of the most widely used
character entities, so that such characters may be entered and
saved as character mnemonics.
You may use your own entity names in TEI-conformant files, if
you wish and if you provide entity declarations for
them, mapping the name to the appropriate Unicode value. The standard names (though long-winded) have the advantage
of clarity; the characters intended are reasonably clear to any
speaker of English who recognizes that a character is being named,
often even without recourse to any list. This is not true of many
older schemes for representing accented characters.
When the character you need does not appear in the public entity
sets, you may wish to generate a name using the same naming
conventions used in ISO public entity sets, as described here:
- digraphs
- Form entity names for digraphs by appending the string
lig to the letters forming the digraph. If a
capitalized form is required, both letters are given in upper case
(remember that case is usually significant in entity names). E.g.:
aelig (æ), AElig
(Æ) szlig (ß).
- diacritics and accents
- Form entity names for accented letters in most Western European
languages by appending one of the following strings to the letter
bearing the accent, which may be in upper or lower case.
- umlaut
- use uml for umlaut or trema: e.g.
auml (ä), Auml (Ä),
euml (ë), iuml
(sic: ï), ouml (ö),
Ouml (Ö), uuml (ü),
Uuml (Ü).
- acute
- use acute for acute or stressed
accent: e.g. aacute (á),
eacute (é), Eacute
(É),
iacute (í), oacute
(ó),
uacute (ú).
- grave
- use grave for grave accent: e.g.
agrave (à), egrave
(è), igrave (ì),
ograve (ò), ugrave
(ù).
- circumflex
- use circ for circumflex: e.g.
acirc (â), ecirc
(ê), Ecirc (Ê), icirc
(î), ocirc (ô), ucirc
(û).
- tilde
- use tilde for tilde: e.g.
atilde (ã), Atilde
(Ã),
ntilde (ñ), Ntilde
(Ñ),
otilde (õ), Otilde
(Õ).
- consonants
- The following are recommended entity names for some special
consonants found in Western European languages: ccedil
(ç), Ccedil (Ç), eth
(lowercase eth or Anglo-Saxon/Icelandic crossed d),
ETH (uppercase eth),
thorn (lowercase thorn),
THORN (uppercase thorn), szlig
(German s-z ligature or esszett, ß).
- punctuation marks
- The following are recommended entity names for some commonly
found punctuation marks:
ldquo (left double quotation mark, in shape of
superscript 66), rdquo (right double quotation
mark, superscript 99), mdash (one-em dash),
hellip (horizontal ellipsis, three closely
spaced dots), rsquo (right single quote, in
shape of superscript 9).
Up: Contents Previous: 17. Technical
Documentation Next: 19. Front and Back Matter
Date:
(revised October 2004) Author: Lou Burnard
(revised SPQR).
Copyright TEI 1995