|
- The Unicode Consortium and the W3C jointly authored a
document that outlines issues to be aware of when using
Unicode in markup languages (Unicode Technical Report #20;
W3C Note 15 December 2000)
- This document has important recommendations of how to
use Unicode in a markup context. The main issues
are:
- Linear versus structured documents
- Conflict of markup constructs and control
structures in the character encoding (e.g. line breaks,
paragraph breaks) The document contains a list of characters
that are unsutable for use in markup because of one or more of the
following reasons
- They are deprecated in the Unicode
Standard. (e.g. they were introduced for compatibility with
existing standards and should not be used in newly created documents)
- They are unsupportable without additional
data. (e.g Object Replacement Character, U+FFFC)
- They are difficult to handle because they are
stateful. (e.g bidirectional markers)
- They are better handled by
markup. (e.g. language tags)
- They are undesirable because of conflict with
equivalent markup. (e.g Fractions, super/subscript
characters etc.)
- The document mentioned gives a very detailed
account of these problems and its recommendations should
be implemented when using Unicode in TEI.
- TEI should develop clear recommendations how to
deal with dual presentation of information (e.g. in the character
encoding and in markup) and how to avoid it.
|