TEI Character Encoding Workgroup
The TEI Character Encoding Workgroup, chaired by Christian Wittern, began its work in 2003. The group completed its work in 2005.
Resources
Draft Documents for P5
Draft Papers
- CE01: Terms of Reference for the TEI Workgroup on Character Encoding
- CE W 01: [DRAFT] Chapter 4: Languages and Character sets
- CE W 02: XSLT-based proof of concept for solutions discussed at Tuebingen meeting
- CE W 03: A collection of use cases for extensions to the basic character set of a document
- CE W 04: Language and script identification (Additional comments from PD)
- CE W 05: Semantics for characters and linguistic features
- CE W 06: Extending the document character set
- CE W 07: Private use characters in XML
- CE W 08:An analysis of topics in P4 chapter 4 and CE W 01.
- CE W 09: Language identification: draft for inclusion in P5/CH
- CE W 12: Report from Sanskrit Workgroup
Meetings and Reports
Background Documents and Links
- Design Of An Electronic Method For Describing Writing Systems (Eric S. Albrights thesis)
- The Text in the Age of Digital Reproduction (Draft paper by Christian Wittern)
- (TEI-C) P4: The XML Version of the TEI Guidelines
- (W3C) Character Model for the World Wide Web 1.0
- (W3C, Unicode Consortium) Unicode in XML and other Markup Languages
- Jukka Korpela: A tutorial on character code issues
Some use cases
- Typographic Regularization in the WWP Textbase A proposal for ACH/ALLC 2001 by Jacqueline H. Russom and Sydney D. Bauman (Scholarly Technology Group, Brown University)
How to refer to characters/glyphs not in the document character set
- The SVG Specification uses an element AltGlyph to refer to variant glyphs
- MathML uses an element <mglyph> for "presentation glyphs".
- Unicode has specific and generic Variation Selectors (U+FE00~U+FE0F), see (Unicode Consortium) Standardized Variants. The usage of these is also discussed in the document Unicode in XML and other Markup Languages mentioned above.
Character semantics
- Unicode defines character semantics in the Unicode Character Database (UCD, available at UnicodeData.txt; here is an explanation of its contents: Unicode Data File Format, see also: (Unicode Consortium, UTR Draft) Unicode Technical Report #23 CHARACTER Properties
- (Unicode Consortium, TUS Annex 21) Case Mappings
- (Unicode Consortium, UTR Draft) Unicode Technical Report #30 Character Foldings
- (Unicode Consortium, TUS Annex 15) Unicode Normalization Forms
Last recorded change to this page:
2007-09-16
• For corrections or updates, contact webmaster AT tei-c DOT org