TEI Character Encoding Workgroup
The TEI Character Encoding Workgroup, chaired by Christian Wittern, began its work in 2003. The group completed its work in 2005.
Draft Documents for P5
- CE01: Terms of Reference for the TEI Workgroup on Character Encoding
- CE W 01: [DRAFT] Chapter 4: Languages and Character sets
- CE W 02: XSLT-based proof of concept for solutions discussed at Tuebingen meeting
- CE W 03: A collection of use cases for extensions to the basic character set of a document
- CE W 04: Language and script identification (Additional comments from PD)
- CE W 05: Semantics for characters and linguistic features
- CE W 06: Extending the document character set
- CE W 07: Private use characters in XML
- CE W 08:An analysis of topics in P4 chapter 4 and CE W 01.
- CE W 09: Language identification: draft for inclusion in P5/CH
- CE W 12: Report from Sanskrit Workgroup
Meetings and Reports
Background Documents and Links
- Design Of An Electronic Method For Describing Writing Systems (Eric S. Albrights thesis)
- The Text in the Age of Digital Reproduction (Draft paper by Christian Wittern)
- (TEI-C) P4: The XML Version of the TEI Guidelines
- (W3C) Character Model for the World Wide Web 1.0
- (W3C, Unicode Consortium) Unicode in XML and other Markup Languages
- Jukka Korpela: A tutorial on character code issues
Some use cases
- Typographic Regularization in the WWP Textbase A proposal for ACH/ALLC 2001 by Jacqueline H. Russom and Sydney D. Bauman (Scholarly Technology Group, Brown University)
How to refer to characters/glyphs not in the document character set
- The SVG Specification uses an element AltGlyph to refer to variant glyphs
- MathML uses an element <mglyph> for "presentation glyphs".
- Unicode has specific and generic Variation Selectors (U+FE00~U+FE0F), see (Unicode Consortium) Standardized Variants. The usage of these is also discussed in the document Unicode in XML and other Markup Languages mentioned above.
- Unicode defines character semantics in the Unicode Character Database (UCD, available at UnicodeData.txt; here is an explanation of its contents: Unicode Data File Format, see also: (Unicode Consortium, UTR Draft) Unicode Technical Report #23 CHARACTER Properties
- (Unicode Consortium, TUS Annex 21) Case Mappings
- (Unicode Consortium, UTR Draft) Unicode Technical Report #30 Character Foldings
- (Unicode Consortium, TUS Annex 15) Unicode Normalization Forms