|
- Assigned Unicode codepoints can be serialized in 7
different forms:
- 8 bit variable length encoding form:
- 16 bit variable length encoding forms:
- UTF-16, UTF-16BE, UTF-16LE
- 32 bit fixed length encoding forms:
- UTF-32, UTF-32BE, UTF-32LE
- Note: 16 and 32 bit forms can have a byte order mark
indicating either big-endian (BE) or little-endian (LE).
- For transmission on the Web UTF-8 is generally the preferred
encoding form.
- In XML and (X)HTML, there is also the possibility to use
numeric character entities (e.g. ā for the
character ā)
|