Next | First | Previous TEI meets Unicode 4

Encoding forms of Unicode

  • Assigned Unicode codepoints can be serialized in 7 different forms:
    • 8 bit variable length encoding form:
      • UTF-8
    • 16 bit variable length encoding forms:
      • UTF-16, UTF-16BE, UTF-16LE
    • 32 bit fixed length encoding forms:
      • UTF-32, UTF-32BE, UTF-32LE
  • Note: 16 and 32 bit forms can have a byte order mark indicating either big-endian (BE) or little-endian (LE).
  • For transmission on the Web UTF-8 is generally the preferred encoding form.
  • In XML and (X)HTML, there is also the possibility to use numeric character entities (e.g. ā for the character ā)