2 The TEI Header

Table of contents

This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented. Such documentation is equally necessary for scholars using the texts, for software processing them, and for cataloguers in libraries and archives. Together these descriptions and declarations provide an electronic analogue to the title page attached to a printed work. They also constitute an equivalent for the content of the code books or introductory manuals customarily accompanying electronic data sets.

Every TEI-conformant text must carry such a set of descriptions, prefixed to it and encoded as described in this chapter. The set is known as the TEI header, tagged teiHeader, and has four major parts:
  1. a file description, tagged fileDesc, containing a full bibliographical description of the computer file itself, from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive. The term computer file here is to be understood as referring to the whole entity or document described by the header, even when this is stored in several distinct operating system files. The file description also includes information about the source or sources from which the electronic document was derived. The TEI elements used to encode the file description are described in section 2.2 The File Description below.
  2. an encoding description, tagged encodingDesc, which describes the relationship between an electronic text and its source or sources. It allows for detailed description of whether (or how) the text was normalized during transcription, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied, and similar matters. The TEI elements used to encode the encoding description are described in section 2.3 The Encoding Description below.
  3. a text profile, tagged profileDesc, containing classificatory and contextual information about the text, such as its subject matter, the situation in which it was produced, the individuals described by or participating in producing it, and so forth. Such a text profile is of particular use in highly structured composite texts such as corpora or language collections, where it is often highly desirable to enforce a controlled descriptive vocabulary or to perform retrievals from a body of text in terms of text type or origin. The text profile may however be of use in any form of automatic text processing. The TEI elements used to encode the profile description are described in section 2.4 The Profile Description below.
  4. a revision history, tagged revisionDesc, which allows the encoder to provide a history of changes made during the development of the electronic text. The revision history is important for version control and for resolving questions about the history of a file. The TEI elements used to encode the revision description are described in section 2.5 The Revision Description below.

A TEI header can be a very large and complex object, or it may be a very simple one. Some application areas (for example, the construction of language corpora and the transcription of spoken texts) may require more specialized and detailed information than others. The present proposals therefore define both a core set of elements (all of which may be used without formality in any TEI header) and some additional elements which become available within the header as the result of including additional specialized modules within the schema. When the module for language corpora (described in chapter 15 Language Corpora) is in use, for example, several additional elements are available, as further detailed in that chapter.

The next section of the present chapter briefly introduces the overall structure of the header and the kinds of data it may contain. This is followed by a detailed description of all the constituent elements which may be used in the core header. Section 2.6 Minimal and Recommended Headers , at the end of the present chapter, discusses the recommended content of a minimal TEI header and its relation to standard library cataloguing practices.

2.1 Organization of the TEI Header

2.1.1 The TEI Header and its Components

The teiHeader element should be clearly distinguished from the front matter of the text itself (for which see section 4.5 Front Matter). A composite text, such as a corpus or collection, may contain several headers, as further discussed below. In the general case, however, a TEI-conformant text will contain a single teiHeader element, followed by a single text or facsimile element, or both.

The header element has the following description:
  • teiHeader (TEI Header) supplies the descriptive and declarative information making up an electronic title page prefixed to every TEI-conformant text.
    typespecifies the kind of document to which the header is attached, for example whether it is a corpus or individual text.
As discussed above, the teiHeader element has four principal components:
  • fileDesc (file description) contains a full bibliographic description of an electronic file.
  • encodingDesc (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
  • profileDesc (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
  • revisionDesc (revision description) summarizes the revision history for a file.
Of these, only the fileDesc element is required in all TEI headers; the others are optional. Only one of the four components of the TEI header (the fileDesc) is mandatory, and it also has some mandatory components, as further discussed in 2.2 The File Description below. The smallest possible valid TEI Header thus looks like this:
<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>
<!-- title of the resource ... -->
   </title>
  </titleStmt>
  <publicationStmt>
   <p>(Information about distribution of the
       resource)</p>
  </publicationStmt>
  <sourceDesc>
   <p>(Information about source from which the resource derives)</p>
  </sourceDesc>
 </fileDesc>
</teiHeader>
The content of the elements making up a TEI Header may be given in any language, not necessarily that of the text to which the header applies, and not necessarily English. As elsewhere, the xml:lang attribute should be used at an appropriate level to specify the language. For example, in the following schematic example, an English text has been given a French header:
<TEI>
 <teiHeader xml:lang="fra">
<!-- ... -->
 </teiHeader>
 <text xml:lang="eng">
<!-- ... -->
 </text>
</TEI>
In the case of language corpora or collections, it may be desirable to record header information either at the level of the individual components in the corpus or collection, or at the level of the corpus or collection itself (more details concerning the tagging of composite texts are given in section 15 Language Corpora, which should be read in conjunction with the current chapter). The type attribute may be used to indicate whether the header applies to a corpus or a single text. A corpus may thus take the form:
<teiCorpus>
 <teiHeader type="corpus">
<!-- corpus-level metadata here -->
 </teiHeader>
 <TEI>
  <teiHeader type="text">
<!-- metadata specific to this text here -->
  </teiHeader>
  <text>
<!-- ... -->
  </text>
 </TEI>
 <TEI>
  <teiHeader type="text">
<!-- metadata specific to this text here -->
  </teiHeader>
  <text>
<!-- ... -->
  </text>
 </TEI>
</teiCorpus>

2.1.2 Types of Content in the TEI Header

The elements occurring within the TEI header may contain several types of content; the following list indicates how these types of content are described in the following sections:
free prose
Most elements contain simple running prose at some level. Many elements may contain either prose (possibly organized into paragraphs) or more specific elements, which themselves contain prose. In this chapter's descriptions of element content, the phrase prose description should be understood to imply a series of paragraphs, each marked as a p element. The word phrase, by contrast, should be understood to imply character data, interspersed as need be with phrase-level elements, but not organized into paragraphs. For more information on paragraphs, highlighted phrases, lists, etc., see section 3.1 Paragraphs.
grouping elements
Elements whose names end with the suffix Stmt (e.g. editionStmt, titleStmt) usually enclose a group of specialized elements recording some structured information. In the case of the bibliographic elements, the suffix Stmt is used in names of elements corresponding to the ‘areas’ of the International Standard Bibliographic Description.4 In most cases grouping elements may contain prose descriptions as an alternative to the set of specialized elements, thus allowing the encoder to choose whether or not the information concerned should be presented in a structured form or in prose.
declarations
Elements whose names end with the suffix Decl (e.g. tagsDecl, refsDecl) enclose information about specific encoding practices applied in the electronic text; often these practices are described in coded form. Typically, such information takes the form of a series of declarations, identifying a code with some more complex structure or description. A declaration which applies to more than one text or division of a text need not be repeated in the header of each such text or subdivision. Instead, the decls attribute of each text (or subdivision of the text) to which the declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
descriptions
Elements whose names end with the suffix Desc (e.g. settingDesc, projectDesc) contain a prose description, possibly, but not necessarily, organized under some specific headings by suggested sub-elements.

2.1.3 Model Classes in the TEI Header

The TEI Header provides a very rich collection of metadata categories, but makes no claim to be exhaustive. It is certainly the case that individual projects may wish to record specialised metadata which either does not fit within one of the predefined categories identified by the TEI Header or requires a more specialized element structure than is proposed here. To overcome this problem, the encoder may elect to define additional elements using the customization methods discussed in 23.2 Personalization and Customization. The TEI class system makes such customizations simpler to effect and easier to use in interchange.

These classes are specific to parts of the header:

2.2 The File Description

This section describes the fileDesc element, which is the first component of the teiHeader element.

The bibliographic description of a machine-readable or digital text resembles in structure that of a book, an article, or any other kind of textual object. The file description element of the TEI header has therefore been closely modelled on existing standards in library cataloguing; it should thus provide enough information to allow users to give standard bibliographic references to the electronic text, and to allow cataloguers to catalogue it. Bibliographic citations occurring elsewhere in the header, and also in the text itself, are derived from the same model (on bibliographic citations in general, see further section 3.11 Bibliographic Citations and References). See further section 2.7 Note for Library Cataloguers.

The bibliographic description of an electronic text should be supplied by the mandatory fileDesc element:
  • fileDesc (file description) contains a full bibliographic description of an electronic file.
The fileDesc element contains three mandatory elements and four optional elements, each of which is described in more detail in sections 2.2.1 The Title Statement to 2.2.6 The Notes Statement below. These elements are listed below in the order in which they must be given within the fileDesc element.
  • titleStmt (title statement) groups information about the title of a work and those responsible for its intellectual content.
  • editionStmt (edition statement) groups information relating to one edition of a text.
  • extent describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.
  • publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
  • seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
  • notesStmt (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.
  • sourceDesc (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.
A complete file description containing all possible sub-elements might look like this:
<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>
<!-- title of the resource -->
   </title>
  </titleStmt>
  <editionStmt>
   <p>
<!-- information about the edition of the resource -->
   </p>
  </editionStmt>
  <extent>
<!-- description of the size of the resource -->
  </extent>
  <publicationStmt>
   <p>
<!-- information about the distribution of the resource -->
   </p>
  </publicationStmt>
  <seriesStmt>
   <p>
<!-- information about any series to which the resource belongs -->
   </p>
  </seriesStmt>
  <notesStmt>
   <note>
<!-- notes on other aspects of the resource -->
   </note>
  </notesStmt>
  <sourceDesc>
   <p>
<!-- information about the source from which the resource was derived -->
   </p>
  </sourceDesc>
 </fileDesc>
</teiHeader>
Of these elements, only the titleStmt, publicationStmt, and sourceDesc are required; the others may be omitted unless considered useful.

2.2.1 The Title Statement

The titleStmt element is the first component of the fileDesc element, and is mandatory:
  • titleStmt (title statement) groups information about the title of a work and those responsible for its intellectual content.
It contains the title given to the electronic work, together with one or more optional statements of responsibility which identify the encoder, editor, author, compiler, or other parties responsible for it:
  • title contains a title for any kind of work.
  • author in a bibliographic reference, contains the name(s) of the author(s), personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.
  • editor secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
  • sponsor specifies the name of a sponsoring organization or institution.
  • funder (funding body) specifies the name of an individual, institution, or organization responsible for the funding of a project or text.
  • principal (principal researcher) supplies the name of the principal researcher responsible for the creation of an electronic text.
  • respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
  • resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
  • name (name, proper noun) contains a proper noun or noun phrase.

The title element contains the chief name of the electronic work, including any alternative title or subtitles it may have. It may be repeated, if the work has more than one title (perhaps in different languages) and takes whatever form is considered appropriate by its creator. Where the electronic work is derived from an existing source text, it is strongly recommended that the title for the former should be derived from the latter, but clearly distinguishable from it, for example by the addition of a phrase such as ‘: an electronic transcription’ or ‘a digital edition’. This will distinguish the electronic work from the source text in citations and in catalogues which contain descriptions of both types of material.

The electronic work will also have an external name (its ‘filename’ or ‘data set name’) or reference number on the computer system where it resides at any time. This name is likely to change frequently, as new copies of the file are made on the computer system. Its form is entirely dependent on the particular computer system in use and thus cannot always easily be transferred from one system to another. Moreover, a given work may be composed of many files. For these reasons, these Guidelines strongly recommend that such names should not be used as the title for any electronic work.

Helpful guidance on the formulation of useful descriptive titles in difficult cases may be found in the Anglo-American Cataloguing Rules (Gorman and Winkler, 1978, chapter 25) or in equivalent national-level bibliographical documentation.

The elements author, editor, sponsor, funder, and principal, are specializations of the more general respStmt element. These elements are used to provide the statements of responsibility which identify the person(s) responsible for the intellectual or artistic content of an item and any corporate bodies from which it emanates.

Any number of such statements may occur within the title statement. At a minimum, identify the author of the text and (where appropriate) the creator of the file. If the bibliographic description is for a corpus, identify the creator of the corpus. Optionally include also names of others involved in the transcription or elaboration of the text, sponsors, and funding agencies. The name of the person responsible for physical data input need not normally be recorded, unless that person is also intellectually responsible for some aspect of the creation of the file.

Where the person whose responsibility is to be documented is not an author, sponsor, funding body, or principal researcher, the respStmt element should be used. This has two subcomponents: a name element identifying a responsible individual or organization, and a resp element indicating the nature of the responsibility. No specific recommendations are made at this time as to appropriate content for the resp: it should make clear the nature of the responsibility concerned, as in the examples below.

Names given may be personal names or corporate names. Give all names in the form in which the persons or bodies wish to be publicly cited. This would usually be the fullest form of the name, including first names.5

Examples:
<titleStmt>
 <title>Capgrave's Life of St. John Norbert: a
   machine-readable transcription</title>
 <respStmt>
  <resp>compiled by</resp>
  <name>P.J. Lucas</name>
 </respStmt>
</titleStmt>
<titleStmt>
 <title>Two stories by Edgar Allen Poe: electronic version</title>
 <author>Poe, Edgar Allen (1809-1849)</author>
 <respStmt>
  <resp>compiled by</resp>
  <name>James D. Benson</name>
 </respStmt>
</titleStmt>
<titleStmt>
 <title>Yogadarśanam (arthāt
   yogasūtrapūṭhaḥ):
   a digital edition.</title>
 <title>The Yogasūtras of Patañjali:
   a digital edition.</title>
 <funder>Wellcome Institute for the History of Medicine</funder>
 <principal>Dominik Wujastyk</principal>
 <respStmt>
  <name>Wieslaw Mical</name>
  <resp>data entry and proof correction</resp>
 </respStmt>
 <respStmt>
  <name>Jan Hajic</name>
  <resp>conversion to TEI-conformant markup</resp>
 </respStmt>
</titleStmt>

2.2.2 The Edition Statement

The editionStmt element is the second component of the fileDesc element. It is optional but recommended.
  • editionStmt (edition statement) groups information relating to one edition of a text.
It contains either phrases or more specialized elements identifying the edition and those responsible for it:
  • edition (edition) describes the particularities of one edition of a text.
  • respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
  • name (name, proper noun) contains a proper noun or noun phrase.
  • resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.

For printed texts, the word edition applies to the set of all the identical copies of an item produced from one master copy and issued by a particular publishing agency or a group of such agencies. A change in the identity of the distributing body or bodies does not normally constitute a change of edition, while a change in the master copy does.

For electronic texts, the notion of a ‘master copy’ is not entirely appropriate, since they are far more easily copied and modified than printed ones; nonetheless the term edition may be used for a particular state of a machine-readable text at which substantive changes are made and fixed. Synonymous terms used in these Guidelines are version, level, and release. The words revision and update, by contrast, are used for minor changes to a file which do not amount to a new edition.

No simple rule can specify how ‘substantive’ changes have to be before they are regarded as producing a new edition, rather than a simple update. The general principle proposed here is that the production of a new edition entails a significant change in the intellectual content of the file, rather than its encoding or appearance. The addition of analytic coding to a text would thus constitute a new edition, while automatic conversion from one coded representation to another would not. Changes relating to the character code or physical storage details, corrections of misspellings, simple changes in the arrangement of the contents and changes in the output format do not normally constitute a new edition, whereas the addition of new information (e.g. a linguistic analysis expressed in part-of-speech tagging, sound or graphics, referential links to external data sets) almost always does.

Clearly, there will always be borderline cases and the matter is somewhat arbitrary. The simplest rule is: if you think that your file is a new edition, then call it such. An edition statement is optional for the first release of a computer file; it is mandatory for each later release, though this requirement cannot be enforced by the parser.

Note that all changes in a file, whether or not they are regarded as constituting a new edition or simply a new revision, should be independently noted in the revision description section of the file header (see section 2.5 The Revision Description).

The edition element should contain phrases describing the edition or version, including the word edition, version, or equivalent, together with a number or date, or terms indicating difference from other editions such as new edition, revised edition etc. Any dates that occur within the edition statement should be marked with the date element. The n attribute of the edition element may be used as elsewhere to supply any formal identification (such as a version number) for the edition.

One or more respStmt elements may also be used to supply statements of responsibility for the edition in question. These may refer to individuals or corporate bodies and can indicate functions such as that of a reviser, or can name the person or body responsible for the provision of supplementary matter, of appendices, etc., in a new edition. For further detail on the respStmt element, see section 3.11 Bibliographic Citations and References.

Some examples follow:
<editionStmt>
 <edition n="P2">Second draft, substantially
   extended, revised, and corrected.</edition>
</editionStmt>
<editionStmt>
 <edition>Student's edition, <date>June 1987</date>
 </edition>
 <respStmt>
  <resp>New annotations by</resp>
  <name>George Brown</name>
 </respStmt>
</editionStmt>

2.2.3 Type and Extent of File

The extent element is the third component of the fileDesc element. It is optional.
  • extent describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.

For printed books, information about the carrier, such as the kind of medium used and its size, are of great importance in cataloguing procedures. The print-oriented rules for bibliographic description of an item's medium and extent need some re-interpretation when applied to electronic media. An electronic file exists as a distinct entity quite independently of its carrier and remains the same intellectual object whether it is stored on a magnetic tape, a CD-ROM, a set of floppy disks, or as a file on a mainframe computer. Since, moreover, these Guidelines are specifically aimed at facilitating transparent document storage and interchange, any purely machine-dependent information should be irrelevant as far as the file header is concerned.

This is particularly true of information about file-type although library-oriented rules for cataloguing often distinguish two types of computer file: ‘data’ and ‘programs’. This distinction is quite difficult to draw in some cases, for example, hypermedia or texts with built in search and retrieval software.

Although it is equally system-dependent, some measure of the size of the computer file may be of use for cataloguing and other practical purposes. Because the measurement and expression of file size is fraught with difficulties, only very general recommendations are possible; the element extent is provided for this purpose. It contains a phrase indicating the size or approximate size of the computer file in one of the following ways:
  • in bytes of a specified length (e.g. ‘4000 16-bit bytes’)
  • as falling within a range of categories, for example:
    • less than 1 Mb
    • between 1 Mb and 5 Mb
    • between 6 Mb and 10 Mb
    • over 10 Mb
  • in terms of any convenient logical units (for example, words or sentences, citations, paragraphs)
  • in terms of any convenient physical units (for example, blocks, disks, tapes)

The use of standard abbreviations for units of quantity is recommended where applicable, here as elsewhere (see http://physics.nist.gov/cuu/Units/binary.html).

Examples:
<extent>between 1 16-bit MB and 2 16-bit MB</extent>
<extent>4.2 MiB</extent>
<extent>4532 bytes</extent>
<extent>3200 sentences</extent>
<extent>5 90 mm High Density Diskettes</extent>

2.2.4 Publication, Distribution, etc.

The publicationStmt element is the fourth component of the fileDesc element and is mandatory.
  • publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
It may contain either a simple prose description organized as one or more paragraphs, or one or more elements from the model.publicationStmt class. This class groups a number of elements which are discussed in order below.
  • publisher provides the name of the organization responsible for the publication or distribution of a bibliographic item.
  • distributor supplies the name of a person or other agency responsible for the distribution of a text.
  • authority (release authority) supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor.

The publisher is the person or institution by whose authority a given edition of the file is made public. The distributor is the person or institution from whom copies of the text may be obtained. Where a text is not considered formally published, but is nevertheless made available for circulation by some individual or organization, this person or institution is termed the release authority.

At least one of the above three elements must be present, unless the entire publication statement is given as prose. Each may be followed by one or more of the following elements, in the following order:6
  • pubPlace (publication place) contains the name of the place where a bibliographic item was published.
  • address contains a postal address, for example of a publisher, an organization, or an individual.
  • idno (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.
    typecategorizes the identifier, for example as an ISBN, Social Security number, etc.
  • availability supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc.
    statussupplies a code identifying the current availability of the text.
  • date contains a date in any format.

Note that the dates, places, etc., given in the publication statement relate to the publisher, distributor, or release authority most recently mentioned. If the text was created at some date other than its date of publication, its date of creation should be given within the profileDesc element, not in the publication statement. Give any other useful dates (e.g., dates of collection of data) in a note.

Additional detailed elements may be used for the encoding of names, dates, and addresses, as further described in section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses when the module described in chapter 13 Names, Dates, People, and Places is included in a schema.

Examples:
<publicationStmt>
 <publisher>Oxford University Press</publisher>
 <pubPlace>Oxford</pubPlace>
 <date>1989</date>
 <idno type="ISBN">0-19-254705-4</idno>
 <availability>
  <p>Copyright 1989, Oxford University Press</p>
 </availability>
</publicationStmt>
<publicationStmt>
 <authority>James D. Benson</authority>
 <pubPlace>London</pubPlace>
 <date>1984</date>
</publicationStmt>
<publicationStmt>
 <publisher>Sigma Press</publisher>
 <address>
  <addrLine>21 High Street,</addrLine>
  <addrLine>Wilmslow,</addrLine>
  <addrLine>Cheshire M24 3DF</addrLine>
 </address>
 <date>1991</date>
 <distributor>Oxford Text Archive</distributor>
 <idno type="ota">1256</idno>
 <availability>
  <p>Available with prior consent of depositor for
     purposes of academic research and teaching only.</p>
 </availability>
</publicationStmt>

2.2.5 The Series Statement

The seriesStmt element is the fifth component of the fileDesc element and is optional.
  • seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
In bibliographic parlance, a series may be defined in one of the following ways:
  • A group of separate items related to one another by the fact that each item bears, in addition to its own title proper, a collective title applying to the group as a whole. The individual items may or may not be numbered.
  • Each of two or more volumes of essays, lectures, articles, or other items, similar in character and issued in sequence.
  • A separately numbered sequence of volumes within a series or serial.
The seriesStmt element may contain a prose description or one or more of the following more specific elements:
  • title contains a title for any kind of work.
  • idno (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.
  • respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
  • resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
  • name (name, proper noun) contains a proper noun or noun phrase.

The idno may be used to supply any identifying number associated with the item, including both standard numbers such as an ISSN and particular issue numbers. (Arabic numerals separated by punctuation are recommended for this purpose: 6.19.33, for example, rather than VI/xix:33). Its type attribute is used to categorize the number further, taking the value ISSN for an ISSN for example.

Examples:
<seriesStmt>
 <title level="s">Machine-Readable Texts for the Study of
   Indian Literature</title>
 <respStmt>
  <resp>ed. by</resp>
  <name>Jan Gonda</name>
 </respStmt>
 <idno type="vol">1.2</idno>
 <idno type="ISSN">0 345 6789</idno>
</seriesStmt>

2.2.6 The Notes Statement

The notesStmt element is the sixth component of the fileDesc element and is optional. If used, it contains one or more note elements, each containing a single piece of descriptive information of the kind treated as ‘general notes’ in traditional bibliographic descriptions.
  • notesStmt (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.
  • note contains a note or annotation.
Some information found in the notes area in conventional bibliography has been assigned specific elements in these Guidelines; in particular the following items should be tagged as indicated, rather than as general notes:
  • the nature, scope, artistic form, or purpose of the file; also the genre or other intellectual category to which it may belong: e.g. ‘Text types: newspaper editorials and reportage, science fiction, westerns, and detective stories’. These should be formally described within the profileDesc element (section 2.4 The Profile Description).
  • summary description providing a factual, non-evaluative account of the subject content of the file: e.g. ‘Transcribes interviews on general topics with native speakers of English in 17 cities during the spring and summer of 1963.’ These should also be formally described within the profileDesc element (section 2.4 The Profile Description).
  • bibliographic details relating to the source or sources of an electronic text: e.g. ‘Transcribed from the Norton facsimile of the 1623 Folio’. These should be formally described in the sourceDesc element (section 2.2.7 The Source Description).
  • further information relating to publication, distribution, or release of the text, including sources from which the text may be obtained, any restrictions on its use or formal terms on its availability. These should be placed in the appropriate division of the publicationStmt element (section 2.2.4 Publication, Distribution, etc.).
  • publicly documented numbers associated with the file: e.g. ‘ICPSR study number 1803’ or ‘Oxford Text Archive text number 1243’. These should be placed in an idno element within the appropriate division of the publicationStmt element. International Standard Serial Numbers (ISSN), International Standard Book Numbers (ISBN), and other internationally agreed upon standard numbers that uniquely identify an item, should be treated in the same way, rather than as specialized bibliographic notes.
Nevertheless, the notesStmt element may be used to record potentially significant details about the file and its features, e.g.:
  • dates, when they are relevant to the content or condition of the computer file: e.g. ‘manual dated 1983’, ‘Interview wave I: Apr. 1989; wave II: Jan. 1990’
  • names of persons or bodies connected with the technical production, administration, or consulting functions of the effort which produced the file, if these are not named in statements of responsibility in the title or edition statements of the file description: e.g. ‘Historical commentary provided by Mark Cohen’
  • availability of the file in an additional medium or information not already recorded about the availability of documentation: e.g. ‘User manual is loose-leaf in eleven paginated sections’
  • language of work and abstract, if not encoded in the langUsage element, e.g. ‘Text in English with summaries in French and German’
  • The unique name assigned to a serial by the International Serials Data System (ISDS), if not encoded in an idno
  • lists of related publications, either describing the source itself, or concerned with the creation or use of the electronic work, e.g. ‘Texts used in Burrows (1987)
Each such item of information may be tagged using the general-purpose note element, which is described in section 3.8 Notes, Annotation, and Indexing. Groups of notes are contained within the notesStmt element, as in the following example:
<notesStmt>
 <note>Historical commentary provided by Mark Cohen.</note>
 <note>OCR scanning done at University of Toronto.</note>
</notesStmt>
There are advantages, however, to encoding such information with more precise elements elsewhere in the TEI header, when such elements are available. For example, the notes above might be encoded as follows:
<titleStmt>
 <title></title>
 <respStmt>
  <persName>Mark Cohen</persName>
  <resp>historical commentary</resp>
 </respStmt>
 <respStmt>
  <orgName>University of Toronto</orgName>
  <resp>OCR scanning</resp>
 </respStmt>
</titleStmt>

2.2.7 The Source Description

The sourceDesc element is the seventh and final component of the fileDesc element. It is a mandatory element and is used to record details of the source or sources from which a computer file is derived. This might be a printed text or manuscript, another computer file, an audio or video recording of some kind, or a combination of these. An electronic file may also have no source, if what is being catalogued is an original text created in electronic form.
  • sourceDesc (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.
The sourceDesc element may contain little more than a simple prose description, or a brief note stating that the document has no source:
<sourceDesc>
 <p>Born digital.</p>
</sourceDesc>
Alternatively, it may contain elements drawn from the following three classes:
These classes make available by default a range of ways of providing bibliographic citations which specify the provenance of the text. For written or printed sources, the source may be described in the same way as any other bibliographic citation, using one of the following elements:
  • bibl (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
  • biblStruct (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order.
  • listBibl (citation list) contains a list of bibliographic citations of any kind.
These elements are described in more detail in section 3.11 Bibliographic Citations and References. Using them, a source might be described in very simple terms:
<sourceDesc>
 <bibl>The first folio of Shakespeare, prepared by
   Charlton Hinman (The Norton Facsimile, 1968)</bibl>
</sourceDesc>
or with more elaboration:
<sourceDesc>
 <biblStruct xml:lang="fr">
  <monogr>
   <author>Eugène Sue</author>
   <title>Martin, l'enfant trouvé</title>
   <title type="sub">Mémoires d'un valet de chambre</title>
   <imprint>
    <pubPlace>Bruxelles et Leipzig</pubPlace>
    <publisher>C. Muquardt</publisher>
    <date when="1846">1846</date>
   </imprint>
  </monogr>
 </biblStruct>
</sourceDesc>
When the header describes a text derived from some pre-existing TEI-conformant or other digital document, it may be simpler to use the following element:
  • biblFull (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.
since this is designed specifically for documents derived from texts which were ‘born digital’, as further discussed in section 2.2.8 Computer Files Derived from Other Computer Files .
When the module for manuscript description is included in a schema, this class also makes available the following element:
  • msDesc (manuscript description) contains a description of a single identifiable manuscript or other text-bearing object.
which enables the encoder to record very detailed information about one or more manuscript or analogous sources, as further discussed in 10 Manuscript Description.
The model.sourceDescPart class also makes available additional elements when additional modules are included. For example, when the spoken module is included, the sourceDesc element may also include the following special-purpose elements, intended for cases where an electronic text is derived from a spoken text rather than a written one:
  • scriptStmt (script statement) contains a citation giving details of the script used for a spoken text.
  • recordingStmt (recording statement) describes a set of recordings used as the basis for transcription of a spoken text.
Full descriptions of these elements and their contents are given in section 8.2 Documenting the Source of Transcribed Speech.

A single electronic text may be derived from multiple source documents, in whole or in part. The sourceDesc may therefore contain a listBibl element grouping together bibl, biblStruct, or msDesc elements for each of the sources concerned. It is also possible to repeat the sourceDesc element in such a case. The decls attribute described in section 15.3 Associating Contextual Information with a Text may be used to associate parts of the encoded text with the bibliographic element from which it derives in either case.

The source description may also include lists of names, persons, places, etc. when these are considered to form part of the source for an encoded document. When such information is recorded using the specialized elements discussed in the namesdates module (13 Names, Dates, People, and Places), the class model.listLike makes available the following elements to hold such information:
  • listNym (list of canonical names) contains a list of nyms, that is, standardized names for any thing.
  • listOrg (list of organizations) contains a list of elements, each of which provides information about an identifiable organization.
  • listPerson (list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source.
  • listPlace (list of places) contains a list of places, optionally followed by a list of relationships (other than containment) defined amongst them.

2.2.8 Computer Files Derived from Other Computer Files

If a computer file (call it B) is derived not from a printed source but from another computer file (call it A) which includes a TEI file header, then the source text of computer file B is another computer file, A. The four sections of A's file header will need to be incorporated into the new header for B in slightly differing ways, as listed below:
fileDesc
A's file description should be copied into the sourceDesc section of B's file description, enclosed within a biblFull element
profileDesc
A's profileDesc should be copied into B's, in principle unchanged; it may however be expanded by project-specific information relating to B.
encodingDesc
A's encoding practice may or (more likely) may not be the same as B's. Since the object of the encoding description is to define the relationship between the current file and its source, in principle only changes in encoding practice between A and B need be documented in B. The relationship between A and its source(s) is then only recoverable from the original header of A. In practice it may be more convenient to create a new complete encodingDesc for B based on A's.
revisionDesc
B is a new computer file, and should therefore have a new revision description. If, however, it is felt useful to include some information from A's revisionDesc, for example dates of major updates or versions, such information must be clearly marked as relating to A rather than to B.
This concludes the discussion of the fileDesc element and its contents.

2.3 The Encoding Description

The encodingDesc element is the second major subdivision of the TEI header. It specifies the methods and editorial principles which governed the transcription or encoding of the text in hand and may also include sets of coded definitions used by other components of the header. Though not formally required, its use is highly recommended.
  • encodingDesc (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
The encoding description may contain any combination of paragraphs of text, marked up using the p element, along with more specialised elements taken from the model.encodingDescPart class. By default, this class makes available the following elements:
  • projectDesc (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
  • samplingDecl (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
  • editorialDecl (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.
  • tagsDecl (tagging declaration) provides detailed information about the tagging applied to a document.
  • refsDecl (references declaration) specifies how canonical references are constructed for this text.
  • classDecl (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text.
  • appInfo (application information) records information about an application which has edited the TEI file.
Each of these elements is further described in the appropriate section below. Other modules have the ability to extend this class; examples are noted in section 2.3.8 Module-Specific Declarations

2.3.1 The Project Description

The projectDesc element may be used to describe, in prose, the purpose for which a digital resource was created, together with any other relevant information concerning the process by which it was assembled or collected. This is of particular importance for corpora or miscellaneous collections, but may be of use for any text, for example to explain why one kind of encoding practice has been followed rather than another.
  • projectDesc (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
For example:
<encodingDesc>
 <projectDesc>
  <p>Texts collected for use in the
     Claremont Shakespeare Clinic, June 1990.</p>
 </projectDesc>
</encodingDesc>

2.3.2 The Sampling Declaration

The samplingDecl element may be used to describe, in prose, the rationale and methods used in selecting texts, or parts of text, for inclusion in the resource.
  • samplingDecl (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
It should include information about such matters as
  • the size of individual samples
  • the method or methods by which they were selected
  • the underlying population being sampled
  • the object of the sampling procedure used
but is not restricted to these.
<samplingDecl>
 <p>Samples of 2000 words taken from the beginning of the text.</p>
</samplingDecl>
It may also include a simple description of any parts of the source text included or excluded.
<samplingDecl>
 <p>Text of stories only has been transcribed. Pull quotes, captions,
   and advertisements have been silently omitted. Any mathematical
   expressions requiring symbols not present in the ISOnum or ISOpub
   entity sets have been omitted, and their place marked with a GAP
   element.</p>
</samplingDecl>

A sampling declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which the sampling declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.

2.3.3 The Editorial Practices Declaration

The editorialDecl element is used to provide details of the editorial practices applied during the encoding of a text.
  • editorialDecl (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.
It may contain a prose description only, or one or more of a set of specialized elements, members of the TEI model.editorialDeclPart class. Where an encoder wishes to record an editorial policy not specified above, this may be done by adding a new element to this class, using the mechanisms discussed in chapter 23.2 Personalization and Customization.
Some of these policy elements carry attributes to support automated processing of certain well-defined editorial decisions; all of them contain a prose description of the editorial principles adopted with respect to the particular feature concerned. Examples of the kinds of questions which these descriptions are intended to answer are given in the list below.
correction
  • correction (correction principles) states how and under what circumstances corrections have been made in the text.
    statusindicates the degree of correction applied to the text.
    methodindicates the method adopted to indicate corrections within the text.

Was the text corrected during or after data capture? If so, were corrections made silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What principles have been adopted with respect to omissions, truncations, dubious corrections, alternate readings, false starts, repetitions, etc.?

normalization
  • normalization indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form.
    sourceindicates the authority for any normalization carried out.
    methodindicates the method adopted to indicate normalizations within the text.

Was the text normalized, for example by regularizing any non-standard spellings, dialect forms, etc.? If so, were normalizations performed silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What authority was used for the regularization? Also, what principles were used when normalizing numbers to provide the standard values for the value attribute described in section 3.5.3 Numbers and Measures and what format used for them?

quotation
  • quotation specifies editorial practice adopted with respect to quotation marks in the original.
    marks(quotation marks) indicates whether or not quotation marks have been retained as content within the text.
    formspecifies how quotation marks are indicated within the text.

How were quotation marks processed? Are apostrophes and quotation marks distinguished? How? Are quotation marks retained as content in the text or replaced by markup? Are there any special conventions regarding for example the use of single or double quotation marks when nested? Is the file consistent in its practice or has this not been checked?

hyphenation
  • hyphenation summarizes the way in which hyphenation in a source text has been treated in an encoded version of it.
    eol(end-of-line) indicates whether or not end-of-line hyphenation has been retained in a text.

Does the encoding distinguish ‘soft’ and ‘hard’ hyphens? What principle has been adopted with respect to end-of-line hyphenation where source lineation has not been retained? Have soft hyphens been silently removed, and if so what is the effect on lineation and pagination?

segmentation
  • segmentation describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc.

How is the text segmented? If s or seg segmentation units have been used to divide up the text for analysis, how are they marked and how was the segmentation arrived at?

stdVals
  • stdVals (standard values) specifies the format used when standardized date or number values are supplied.

In most cases, attributes bearing standardized values (such as the when or when-iso attribute on dates) should conform to a defined W3C or ISO datatype. In cases where this is not appropriate, this element may be used to describe the standardization methods underlying the values supplied.

interpretation
  • interpretation describes the scope of any analytic or interpretive information added to the text in addition to the transcription.

Has any analytic or ‘interpretive’ information been provided — that is, information which is felt to be non-obvious, or potentially contentious? If so, how was it generated? How was it encoded? If feature-structure analysis has been used, are fsdDecl elements (section 18.11 Feature System Declaration) present?

Any information about the editorial principles applied not falling under one of the above headings should be recorded in a distinct list of items. Experience shows that a full record should be kept of decisions relating to editorial principles and encoding practice, both for future users of the text and for the project which produced the text in the first instance. Some simple examples follow:
<editorialDecl>
 <segmentation>
  <p>
   <gi>s</gi> elements mark orthographic sentences and
     are numbered sequentially
     within their parent <gi>div</gi> element
  </p>
 </segmentation>
 <interpretation>
  <p>The part of speech analysis applied throughout section 4 was
     added by hand and has not been checked.</p>
 </interpretation>
 <correction>
  <p>Errors in transcription controlled by using the
     WordPerfect spelling checker.</p>
 </correction>
 <normalization source="http://szotar.sztaki.hu/webster/">
  <p>All words converted to Modern American spelling following
     Websters 9th Collegiate dictionary.</p>
 </normalization>
 <quotation marks="all" form="std">
  <p>All opening quotation marks represented by entity reference
  <ident type="ge">odq</ident>; all closing quotation marks
     represented by entity reference <ident type="ge">cdq</ident>.</p>
 </quotation>
</editorialDecl>

An editorial practices declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which it applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.

2.3.4 The Tagging Declaration

The tagsDecl element is used to record the following information about the tagging used within a particular text:
  • the namespace to which elements appearing within the transcribed text belong.
  • how often particular elements appear within the text, so that a recipient can validate the integrity of a text during interchange.
  • any comment relating to the usage of particular elements not specified elsewhere in the header.
  • a default rendition applicable to all instances of an element.
This information is conveyed by the following elements:
  • rendition supplies information about the rendition or appearance of one or more elements in the source text.
    schemeidentifies the language used to describe the rendition.
  • namespace supplies the formal name of the namespace to which the elements documented by its children belong.
  • tagUsage supplies information about the usage of a specific element within a text.

The tagsDecl element consists of an optional sequence of rendition elements, each of which must bear a unique identifier, followed by an optional sequence of one or more namespace elements, each of which contains a series of tagUsage elements, one for each distinct element from that namespace occurring within the outermost text element of a TEI document. Note that these tagUsage elements must be nested within a namespace element, and cannot appear directly within the tagsDecl element.

2.3.4.1 Rendition
The rendition element allows the encoder to specify how one or more elements are rendered in the original source in any of the following ways:
  • using an informal prose description
  • using a standard stylesheet language such as CSS or XSL-FO
  • using a project-defined formal language
One or more such specifications may be associated with elements of a document in two ways:
  • the render attribute of the appropriate tagUsage element may be used to indicate a default rendition for all occurrences of the named element
  • the global rendition attribute may be used on any element to indicate its rendition, over-riding any supplied default value
The global rend attribute may also be used to supply an informal description of the rendering for an element; if this is supplied in addition to the rendition attribute it takes precedence, just as it also overrides any default specified for that element.
For example, the following schematic shows how an encoder might specify that all p elements are by default to be rendered using one set of specifications identified as style1, while hi elements are to use a different set, identified as style2:
<tagsDecl>
 <rendition xml:id="style1">
   ... description of one default rendition here ...
 </rendition>
 <rendition xml:id="style2">
   ... description of another default rendition here ...
 </rendition>
 <namespace name="http://www.tei-c.org/ns/1.0">
  <tagUsage gi="p" render="#style1"> ... </tagUsage>
  <tagUsage gi="hi" render="#style2"> ... </tagUsage>
 </namespace>
</tagsDecl>
<!-- elsewhere in the document -->
<p>This paragraph,mostly rendered in style1, contains a few words
<hi>rendered in style2</hi>
</p>
<p rendition="#style2">This paragraph is all rendered in style2</p>
<p>This is back to style1</p>
As noted above, the content of the rendition element may describe the appearance of the source material using prose, a project-defined formal language, or either of the existing standard languages: the Cascading Stylesheet Language (Lie and Bos (eds.) (1999)) and the XML vocabulary for specifying formatting semantics which forms a part of the W3C's Extensible Stylesheet Language (Berglund (ed.) (2006)). The scheme attribute indicates which of these applies to a given rendition element, and takes the following values:
free
Informal free text description
css
Cascading Stylesheet Language
xslfo
Extensible Stylesheet Language Formatting Objects
other
A user-defined formal description language
In the following extended example we consider how to capture the appearance of a typical early 20th century titlepage, such as that in the following figure: Elements for the encoding of the information on a titlepage are presented in 4.6 Title Pages; here we consider how we might go about encoding some of the visual information as well, using the rendition element and its corresponding attributes.
First we define a rendition element for each aspect of the source page rendition that we wish to retain. Details of CSS are given in Lie and Bos (eds.) (1999); we use it here simply to provide a vocabulary with which to describe such aspects as font size and style, letter and line spacing, colour, etc. Note that the purpose of this encoding is to describe the original, rather than specify how it should be reproduced, although the two are obviously closely linked.
<tagsDecl>
 <rendition xml:id="center" scheme="css">text-align: center;</rendition>
 <rendition xml:id="small" scheme="css">font-size: small;</rendition>
 <rendition xml:id="large" scheme="css">font-size: large;</rendition>
 <rendition xml:id="x-large" scheme="css">font-size: x-large;</rendition>
 <rendition xml:id="xx-large" scheme="css">font-size: xx-large</rendition>
 <rendition xml:id="expanded" scheme="css">letter-spacing: +3pt;</rendition>
 <rendition xml:id="x-space" scheme="css">line-height: 150%;</rendition>
 <rendition xml:id="xx-space" scheme="css">line-height: 200%;</rendition>
 <rendition xml:id="red" scheme="css">color: red;</rendition>
</tagsDecl>
The global rendition attribute can now be used to specify on any element which of the above rendition features apply to it. For example, a title page might be encoded as follows:
<titlePage>
 <docTitle rendition="#center #x-space">
  <titlePart>
   <lb/>
   <hi rendition="#x-large">THE POEMS</hi>
   <lb/>
   <hi rendition="#small">OF</hi>
   <lb/>
   <hi rendition="#red #xx-large">ALGERNON CHARLES SWINBURNE</hi>
   <lb/>
   <hi rendition="#large #xx-space">IN SIX VOLUMES</hi>
  </titlePart>
  <titlePart rendition="#xx-space">
   <lb/> VOLUME I.
  <lb/>
   <hi rendition="#red #x-large">POEMS AND BALLADS</hi>
   <lb/>
   <hi rendition="#x-space">FIRST SERIES</hi>
  </titlePart>
 </docTitle>
 <docImprint rendition="#center">
  <lb/>
  <pubPlace rendition="#xx-space">LONDON</pubPlace>
  <lb/>
  <publisher rendition="#red #expanded">CHATTO &amp; WINDUS</publisher>
  <lb/>
  <docDate when="1904" rendition="#small">1904</docDate>
 </docImprint>
</titlePage>

When CSS is used as the underlying language, the scope attribute may be used to specify CSS pseudo-elements. These pseudo-elements are used to target styling for only a portion of the given text. For example, there is a first-letter pseudo-element to target styling of the first letter in the targeted element, while there are the useful before and after pseudo-elements, used often in conjunction with the "content" property to add some styling characters (Unicode provides quite a few) before or after the element content, where these are useful to document the appearance of the source.

For example, assuming that a text has been encoded using the q element to enclose passages in quotation marks, but the quotation marks themselves have been routinely omitted from the encoding, a set of renditions such as the following:
<rendition xml:id="quoteBefore" scheme="css" scope="before">content:
'“';</rendition>
<rendition xml:id="quoteAfter" scheme="css" scope="after">content:
'”';</rendition>
might be used to predefine pseudo-elements quoteBefore and quoteAfter. Where a q element is actually rendered in the source with initial and final quotation marks, it may then be encoded as follows:
<q rendition="#quoteBefore #quoteAfter">Four score and seven years
ago...</q>
2.3.4.2 Tag usage

As noted above, each namespace element, if present, should contain exactly one occurrence of a tagUsage element for each distinct element from the given namespace that occurs within the outermost text element associated with the teiHeader in which it appears.7 The tagUsage element is used to supply a count of the number of occurrences of this element within the text, which is given as the value of its occurs attribute. It may also be used to hold any additional usage information, which is supplied as running prose within the element itself.

For example:
<tagUsage gi="hi" occurs="28"> Used only to mark English words italicised in the copy text.
</tagUsage>
This indicates that the hi element appears a total of 28 times in the text element in question, and that the encoder has used it to mark italicised English words only.
The withId attribute may optionally be used to specify how many of the occurrences of the element in question bear a value for the global xml:id attribute, as in the following example:
<tagUsage gi="pb" occurs="321" withId="321"> Marks page breaks in the York (1734) edition only
</tagUsage>
This indicates that the pb element occurs 321 times, on each of which an identifier is provided.
The content of the tagUsage element is not susceptible of automatic processing. It should not therefore be used to hold information for which provision is already made by other components of the encoding description. A TEI conformant document is not required to provide any tagUsage elements, but if it does, then TEI recommended practice is to provide namespace and tagUsage elements for each distinct element and namespace used in the associated text. If, in addition, counts are specified by the occurs attributes, these must correspond with the number of such elements present in the document.

2.3.5 The Reference System Declaration

The refsDecl element is used to document the way in which any standard referencing scheme built into the encoding works. It may contain either a series of prose paragraphs or the following specialized elements:
  • refsDecl (references declaration) specifies how canonical references are constructed for this text.
  • cRefPattern (canonical reference pattern) specifies an expression and replacement pattern for transforming a canonical reference into a URI.
  • refState/ (reference state) specifies one component of a canonical reference defined by the milestone method.
Note that not all possible referencing schemes are equally easily supported by current software systems. A choice must be made between the convenience of the encoder and the likely efficiency of the particular software applications envisaged, in this context as in many others. For a more detailed discussion of referencing systems supported by these Guidelines, see section 3.10 Reference Systems below.
A referencing scheme may be described in one of three ways using this element:
  • as a prose description
  • as a series of pairs of regular expressions and XPaths
  • as a concatenation of sequentially organized milestones
Each method is described in more detail below. Only one method can be used within a single refsDecl element.

More than one refsDecl element can be included in the header if more than one canonical reference scheme is to be used in the same document, but the current proposals do not check for mutual inconsistency.

2.3.5.1 Prose Method

The referencing scheme may be specified within the refsDecl by a simple prose description. Such a description should indicate which elements carry identifying information, and whether this information is represented as attribute values or as content. Any special rules about how the information is to be interpreted when reading or generating a reference string should also be specified here. Such a prose description cannot be processed automatically, and this method of specifying the structure of a canonical reference system is therefore not recommended for automatic processing.

For example:
<refsDecl>
 <p>The <att>n</att> attribute of each text in this corpus carries a
   unique identifying code for the whole text. The title of the text is
   held as the content of the first <gi>head</gi> element within each
   text. The <att>n</att> attribute on each <gi>div1</gi> and
 <gi>div2</gi> contains the canonical reference for each such
   division, in the form 'XX.yyy', where XX is the book number in Roman
   numerals, and yyy the section number in arabic. Line breaks are
   marked by empty <gi>lb</gi> elements, each of which includes the
   through line number in Casaubon's edition as the value of its
 <gi>n</gi> attribute.</p>
 <p>The through line number and the text identifier uniquely identify
   any line. A canonical reference may be made up by concatenating the
 <gi>n</gi> values from the <gi>text</gi>, <gi>div1</gi>, or
 <gi>div2</gi> and calculating the line number within each part.</p>
</refsDecl>
2.3.5.2 Search-and-Replace Method
This method often requires a significant investment of effort initially, but permits extremely flexible addressing. For details, see section 16.2.5 Canonical References.
  • cRefPattern (canonical reference pattern) specifies an expression and replacement pattern for transforming a canonical reference into a URI.
2.3.5.3 Milestone Method

This method is appropriate when only ‘milestone’ tags (see section 3.10.3 Milestone Elements) are available to provide the required referencing information. It does not provide any abilities which cannot be mimicked by the search-and-replace referencing method discussed in the previous section, but in the cases where it applies, it provides a somewhat simpler notation.

A reference based on milestone tags concatenates the values specified by one or more such tags. Since each tag marks the point at which a value changes, it may be regarded as specifying the refState of a variable. A reference declaration using this method therefore specifies the individual components of the canonical reference as a sequence of refState elements:
  • refState/ (reference state) specifies one component of a canonical reference defined by the milestone method.
    unitindicates what kind of state is changing at this milestone.
    delim(delimiter) supplies a delimiting string following the reference component.
    lengthspecifies the fixed length of the reference component.

For example, the reference ‘Matthew 12:34’ might be thought of as representing the state of three variables: the book variable is in state ‘Matthew’; the chapter variable is in state ‘12’, and the verse variable is in state ‘34’. If milestone tagging has been used, there should be a tag marking the point in the text at which each of the above ‘variables’ changes its state.8 To find ‘Matthew 12:34’ therefore an application must scan left to right through the text, monitoring changes in the state of each of these three variables as it does so. When all three are simultaneously in the required state, the desired point will have been reached. There may of course be several such points.

The delim and length attributes are used to specify components of a canonical reference using this method in exactly the same way as for the stepwise method described in the preceding section. The other attributes are used to determine which instances of milestone tags in the text are to be checked for state-changes. A state-change is signalled whenever a new milestone tag is found with unit and, optionally, ed attributes identical to those of the refState element in question. The value for the new state may be given explicitly by the n attribute on the milestone element, or it may be implied, if the n attribute is not specified.

For example, for canonical references in the form xx.yyy where the xx represents the page number in the first edition, and yyy the line number within this page, a reference system declaration such as the following would be appropriate:
<refsDecl>
 <refState
   ed="first"
   unit="page"
   length="2"
   delim="."/>

 <refState ed="first" unit="line" length="3"/>
</refsDecl>
This implies that milestone tags of the form
<milestone n="II" ed="first" unit="page"/>
<milestone ed="first" unit="line"/>
will be found throughout the text, marking the positions at which page and line numbers change. Note that no value has been specified for the n attribute on the second milestone tag above; this implies that its value at each state change is monotonically increased. For more detail on the use of milestone tags, see section 3.10.3 Milestone Elements.

The milestone referencing scheme, though conceptually simple, is not supported by a generic SGML or XML parser. Its use places a correspondingly greater burden of verification and accuracy on the encoder.

A reference system declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which the declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.

2.3.6 The Classification Declaration

The classDecl element is used to group together definitions or sources for any descriptive classification schemes used by other parts of the header. Each such scheme is represented by a taxonomy element, which may contain either a simple bibliographic citation, or a definition of the descriptive typology concerned; the following elements are used in defining a descriptive classification scheme:
  • classDecl (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text.
  • taxonomy defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.
  • category contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy.
  • catDesc (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal textDesc.
The taxonomy element has two slightly different, but related, functions. For well-recognized and documented public classification schemes, such as Dewey or other published descriptive thesauri, it contains simply a bibliographic citation indicating where a full description of a particular taxonomy may be found.
<taxonomy xml:id="DDC12">
 <bibl>
  <title>Dewey Decimal Classification</title>
  <edition>Abridged Edition 12</edition>
 </bibl>
</taxonomy>
For less easily accessible schemes, the taxonomy element contains a description of the taxonomy itself as well as an optional bibliographic citation. The description consists of a number of category elements, each defining a single category within the given typology. The category is defined by the contents of a nested catDesc element, which may contain either a phrase describing the category, or any number of elements from the model.catDescPart class. When the corpus module is included in a schema, this class provides the textDesc element whose components allow the definition of a text type in terms of a set of ‘situational parameters’ (see further section 15.2.1 The Text Description; if the corpus module is not included in a schema, this class is empty and the catDesc element may contain only plain text.
If the category is subdivided, each subdivision is represented by a nested category element, having the same structure. Categories may be nested to an arbitrary depth in order to reflect the hierarchical structure of the taxonomy. Each category element bears a unique xml:id attribute, which is used as the target for catRef elements referring to it.
<taxonomy xml:id="b">
 <bibl>Brown Corpus</bibl>
 <category xml:id="b.a">
  <catDesc>Press Reportage</catDesc>
  <category xml:id="b.a1">
   <catDesc>Daily</catDesc>
  </category>
  <category xml:id="b.a2">
   <catDesc>Sunday</catDesc>
  </category>
  <category xml:id="b.a3">
   <catDesc>National</catDesc>
  </category>
  <category xml:id="b.a4">
   <catDesc>Provincial</catDesc>
  </category>
  <category xml:id="b.a5">
   <catDesc>Political</catDesc>
  </category>
  <category xml:id="b.a6">
   <catDesc>Sports</catDesc>
  </category>
 </category>
 <category xml:id="b.d">
  <catDesc>Religion</catDesc>
  <category xml:id="b.d1">
   <catDesc>Books</catDesc>
  </category>
  <category xml:id="b.d2">
   <catDesc>Periodicals and tracts</catDesc>
  </category>
 </category>
</taxonomy>
Linkage between a particular text and a category within such a taxonomy is made by means of the catRef element within the textClass element, as described in section 2.4.3 The Text Classification. Where the taxonomy permits of classification along more than one dimension, more than one category will be referenced by a particular catRef, as in the following example, which identifies a text with the sub-categories ‘Daily’, ‘National’, and ‘Political’ within the category ‘Press Reportage’ as defined above.
<catRef target="#b.a1 #b.a3 #b.a5"/>
A single category may contain more than one catDesc child, when for example the category is described in more than one language, as in the following example:
<category xml:id="lit">
 <catDesc xml:lang="pl">literatura piękna</catDesc>
 <catDesc xml:lang="en">fiction</catDesc>
 <category xml:id="litProza">
  <catDesc xml:lang="pl">proza</catDesc>
  <catDesc xml:lang="en">prose</catDesc>
 </category>
 <category xml:id="litPoezja">
  <catDesc xml:lang="pl">poezja</catDesc>
  <catDesc xml:lang="en">poetry</catDesc>
 </category>
 <category xml:id="litDramat">
  <catDesc xml:lang="pl">dramat</catDesc>
  <catDesc xml:lang="en">drama</catDesc>
 </category>
</category>

2.3.7 The Application Information Element

It is sometimes convenient to store information relating to the processing of an encoded resource within its header. Typical uses for such information might be:
  • to allow an application to discover that it has previously opened or edited a file, and what version of itself was used to do that;
  • to show (through a date) which application last edited the file to allow for diagnosis of any problems that might have been caused by that application;
  • to allow users to discover information about an application used to edit the file
  • to allow the application to declare an interest in elements of the file which it has edited, so that other applications or human editors may be more wary of making changes to those sections of the file.
The class model.applicationLike provides an element, application, which may be used to record such information within the appInfo element.
  • appInfo (application information) records information about an application which has edited the TEI file.
  • application provides information about an application which has acted upon the document.
    identSupplies an identifier for the application, independent of its version number or display name.
    versionSupplies a version number for the application, independent of its identifier or display name.

Each application element identifies the current state of one software application with regard to the current file. This element is a member of the att.datable class, which provides a variety of attributes for associating this state with a date and time, or a temporal range. The ident and version attributes should be used to uniquely identify the application and its major version number (for example, ImageMarkupTool 1.5). It is not intended that an application should add a new application each time it touches the file.

The following example shows how these elements might be used to document the fact that version 1.5 of an application called ‘Image Markup Tool’ has an interest in two parts of a document which was last saved on June 6 2006. The parts concerned are accessible at the URLs given as target for the two ptr elements.
<appInfo>
 <application version="1.5" ident="ImageMarkupTool" notAfter="2006-06-01">
  <label>Image Markup Tool</label>
  <ptr target="#P1"/>
  <ptr target="#P2"/>
 </application>
</appInfo>

2.3.8 Module-Specific Declarations

The elements discussed so far are available to any schema. When the schema in use includes some of the more specialised TEI modules, these make available other more module-specific components of the encoding description. These are discussed fully in the documentation for the module in question, but are also noted briefly here for convenience.

The fsdDecl element is available only when the iso-fs module is included in a schema. Its purpose is to document the feature system declaration (as defined in chapter 18.11 Feature System Declaration) underlying any analytic feature structures (as defined in chapter 18 Feature Structures) present in the text documented by this header.

The metDecl element is available only when the verse module is included in a schema. Its purpose is to document any metrical notation scheme used in the text, as further discussed in section 6.3 Rhyme and Metrical Analysis. It consists either of a prose description or a series of metSym elements.

The variantEncoding element is available only when the textcrit module is included in a schema. Its purpose is to document the method used to encode textual variants in the text, as discussed in section 12.2 Linking the Apparatus to the Text.

2.4 The Profile Description

The profileDesc element is the third major subdivision of the TEI Header. It is an optional element, the purpose of which is to enable information characterizing various descriptive aspects of a text or a corpus to be recorded within a single unified framework.
  • profileDesc (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
In principle, almost any component of the header might be of importance as a means of characterizing a text. The author of a written text, its title or its date of publication, may all be regarded as characterizing it at least as strongly as any of the parameters discussed in this section. The rule of thumb applied has been to exclude from discussion here most of the information which generally forms part of a standard bibliographic style description, if only because such information has already been included elsewhere in the TEI header.
The profileDesc element contains an optional creation element, followed by any number of additional elements taken from the model.profileDesc class. In the simplest case, this means it may contain the following elements:
  • creation contains information about the creation of a text.
  • langUsage (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text.
  • textClass (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.
These elements are further described in the remainder of this section.
When the the corpus module described in chapter 15 Language Corpora is included in a schema, three further elements become available within the profileDesc element:
  • textDesc (text description) provides a description of a text in terms of its situational parameters.
  • particDesc (participation description) describes the identifiable speakers, voices, or other participants in any kind of text.
  • settingDesc (setting description) describes the setting or settings within which a language interaction takes place, either as a prose description or as a series of setting elements.
For descriptions of these elements, see section 15.2 Contextual Information.
When the transcr module for the transcription of primary sources described in chapter 11 Representation of Primary Sources is included in a schema, the following element becomes available within the profileDescelement:
  • handNotes contains one or more handNote elements documenting the different hands identified within the source texts.
For a description of this element, see section 11.4.1 Document Hands. Its purpose is to group together a number of handNote elements, each of which describes a different hand or equivalent identified within a manuscript. The handNote element can also appear within a structured manuscript description, when the msdescription module described in chapter 10 Manuscript Description is included in a schema. For this reason, the handNote element is actually declared within the header module, but is only accessible to a schema when one or other of the transcr or msdescription modules is included in a schema. See further the discussion at 11.4.1 Document Hands.

2.4.1 Creation

The creation element contains phrases describing the origin of the text, e.g. the date and place of its composition.
  • creation contains information about the creation of a text.
The date and place of composition are often of particular importance for studies of linguistic variation; since such information cannot be inferred with confidence from the bibliographic description of the copy text, the creation element may be used to provide a consistent location for this information:
<creation>
 <date when="1992-08">August 1992</date>
 <rs type="city">Taos, New Mexico</rs>
</creation>

2.4.2 Language Usage

The langUsage element is used within the profileDesc element to describe the languages, sublanguages, registers, dialects, etc. represented within a text. It contains one or more language elements, each of which provides information about a single language, notably the quantity of that language present in the text. Note that this element should not be used to supply information about any non-standard characters or glyphs used by this language; such information should be recorded in the charDecl element in the encoding description (see further 5 Representation of Non-standard Characters and Glyphs).
  • langUsage (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text.
  • language characterizes a single language or sublanguage used within a text.
    usagespecifies the approximate percentage (by volume) of the text which uses this language.
    ident(identifier) Supplies a language code constructed as defined in BCP 47 which is used to identify the language documented by this element, and which is referenced by the global xml:lang attribute.

A language element may be supplied for each different language used in a document. If used, its ident attribute should specify an appropriate language identifier, as further discussed in section vi.1. Language identification. This is particularly important if extended language identifiers have been used as the value of xml:lang attributes elsewhere in the document.

Here is an example of the use of this element:
<langUsage>
 <language ident="fr-CA" usage="60">Québecois</language>
 <language ident="en-CA" usage="20">Canadian business English</language>
 <language ident="en-GB" usage="20">British English</language>
</langUsage>

2.4.3 The Text Classification

The second component of the core profileDesc element is the textClass element. This element is used to classify a text according to one or more of the following methods:
  • by reference to a recognized international classification such as the Dewey Decimal Classification, the Universal Decimal Classification, the Colon Classification, the Library of Congress Classification, or any other system widely used in library and documentation work
  • by providing a set of keywords, as provided for example by British Library or Library of Congress Cataloguing in Publication data
  • by referencing any other taxonomy of text categories recognized in the field concerned, or peculiar to the material in hand; this may include one based on recurring sets of values for the situational parameters defined in section 15.2.1 The Text Description, or the demographic elements described in section 15.2.2 The Participant Description
The last of these may be particularly important for dealing with existing corpora or collections, both as a means of avoiding the expense or inconvenience of reclassification and as a means of documenting the organizing principles of such materials.
The following elements are provided for this purpose:
  • keywords contains a list of keywords or phrases identifying the topic or nature of a text.
    schemeidentifies the controlled vocabulary within which the set of keywords concerned is defined.
  • classCode (classification code) contains the classification code used for this text in some standard classification system.
    schemeidentifies the classification system or taxonomy in use.
  • catRef/ (category reference) specifies one or more defined categories within some taxonomy or text typology.

The keywords element simply categorizes an individual text by supplying a list of keywords which may describe its topic or subject matter, its form, date, etc. In some schemes, the order of items in the list is significant, for example, from major topic to minor; in others, the list has an organized substructure of its own. No recommendations are made here as to which method is to be preferred. Wherever possible, such keywords should be taken from a recognized source, such as the British Library/Library of Congress Cataloguing in Publication data in the case of printed books, or a published thesaurus appropriate to the field.

The scheme attribute should be used to indicate the source of the keywords used. If the keywords are taken from some externally defined authority which is available online, this attribute should point directly to it, as in the following examples:
<keywords scheme="http://classificationweb.net">
 <term>Babbage, Charles</term>
 <term>Mathematicians - Great Britain - Biography</term>
</keywords>
<keywords
  scheme="http://id.loc.gov/authorities/about.html#lcsh">

 <term>English literature -- History and criticism -- Data processing.</term>
 <term>English literature -- History and criticism -- Theory, etc.</term>
 <term>English language -- Style -- Data processing.</term>
 <term>Style, Literary -- Data processing.</term>
</keywords>
If the authority file is not available online, but is generally recognized and commonly cited, a bibliographic description for it should be supplied within the taxonomy element described in section 2.3.6 The Classification Declaration; the scheme attribute may then reference that taxonomy element by means of its identifier in the usual way:
<keywords scheme="#welch">
 <term>ceremonials</term>
 <term>fairs</term>
 <term>street life</term>
</keywords>
<!-- elsewhere in the document -->
<taxonomy xml:id="welch">
 <bibl>
  <title>Notes on London Municipal Literature, and a Suggested
     Scheme for Its Classification</title>
  <author>Charles Welch</author>
  <edition>1895</edition>
 </bibl>
</taxonomy>

Alternatively, if the keyword vocabulary itself is locally defined, the scheme attribute will point to the local definition, which will typically be held in a taxonomy element within the classDecl part of the encoding description (see section 2.3.6 The Classification Declaration).

The classCode element also categorizes an individual text, by supplying a numerical or other code rather than descriptive terms. Such codes constitute a recognized classification scheme, such as the Dewey Decimal Classification. The scheme attribute is used to indicate the source of the classification scheme in the same way as for keywords: this may be a pointer of any kind, either to a TEI element, possibly in the current document, as in the keywords examples above, or to some canonical source for the scheme, as in the following example:
<classCode
  scheme="http://www.udcc.org/udcsummary/php/index.php">
005.756</classCode>

The catRef element categorizes an individual text by pointing to one or more category elements using the target attribute, which it inherits from the att.pointing class. The category element (which is fully described in section 2.3.6 The Classification Declaration) holds information about a particular classification or category within a given taxonomy. Each such category must have a unique identifier, which may be supplied as the value of the target attribute for catRef elements which are regarded as falling within the category indicated.

A text may, of course, fall into more than one category, in which case more than one identifier will be supplied as the value for the target attribute on the catRef element, as in the following example:
<catRef target="#b.a4 #b.d2"/>
The scheme attribute may be supplied to specify the taxonomy to which the categories identified by the target attribute belong, if this is not adequately conveyed by the resource pointed to. For example,
<catRef
  target="#b.a4 #b.d2"
  scheme="http://www.example.com/browncorpus"/>

<catRef target="http://www.example.com/SUC/#A45"/>
Here the same text has been classified as of categories b.a4 and b.d2 within the Brown classification scheme (presumed to be available from http://www.example.com/browncorpus), and as of category ‘A45’ within the SUC classification scheme documented at the URL given.

The distinction between the catRef and classCode elements is that the values used as identifying codes are exhaustively enumerated, typically with the header, for the former, while the latter may be used to indicate a more open ended or descriptive classification system.

2.5 The Revision Description

The final sub-element of the TEI header, the revisionDesc element, provides a detailed change log in which each change made to a text may be recorded. Its use is optional but highly recommended. It provides essential information for the administration of large numbers of files which are being updated, corrected, or otherwise modified as well as extremely useful documentation for files being passed from researcher to researcher or system to system. Without change logs, it is easy to confuse different versions of a file, or to remain unaware of small but important changes made in the file by some earlier link in the chain of distribution. No change should be made in any TEI-conformant file without corresponding entries being made in the change log.
  • revisionDesc (revision description) summarizes the revision history for a file.
  • change summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers.

The main purpose of the revision description is to record changes in the text to which a header is prefixed. However, it is recommended TEI practice to include entries also for significant changes in the header itself (other than the revision description itself, of course). At the very least, an entry should be supplied indicating the date of creation of the header.

The log consists of a list of entries, one for each change. This may be encoded using either the regular list element, as described in section 3.7 Lists or as a series of special purpose change elements, each of which contains a more detailed description of the changes made. The attributes when and who are used to indicate the date of the change and the person responsible for it respectively. The description of the change itself can range from a simple phrase to a series of paragraphs. If a number is to be associated with one or more changes (for example, a revision number), the global n attribute may be used to indicate it.

It is recommended to give changes in reverse chronological order, most recent first.

For example:

<!-- ... --><revisionDesc>
 <change n="RCS:1.39" when="2007-08-08" who="#jwernimo.lrv">Changed <val>drama.verse</val>
  <gi>lg</gi>s to <gi>p</gi>s. <note>we have opened a discussion about the need for a new
     value for <att>type</att> of <gi>lg</gi>, <val>drama.free.verse</val>, in order to address
     the verse of Behn which is not in regular iambic pentameter. For the time being these
     instances are marked with a comment note until we are able to fully consider the best way
     to encode these instances.</note>
 </change>
 <change n="RCS:1.33" when="2007-06-28" who="#pcaton.xzc">Added <att>key</att> and <att>reg</att>
   to <gi>name</gi>s.</change>
 <change n="RCS:1.31" when="2006-12-04" who="#wgui.ner">Completed renovation. Validated.</change>
</revisionDesc>
In the above example, the who attributes point to respStmt elements which have been included earlier in the titleStmt of the same header:
<titleStmt>
 <title>The Amorous Prince, or, the Curious Husband, 1671</title>
 <author>
  <persName ref="#abehn.aeh">Behn, Aphra</persName>
 </author>
 <respStmt xml:id="pcaton.xzc">
  <persName>Caton, Paul</persName>
  <resp>electronic publication editor</resp>
 </respStmt>
 <respStmt xml:id="wgui.ner">
  <persName>Gui, Weihsin</persName>
  <resp>encoder</resp>
 </respStmt>
 <respStmt xml:id="jwernimo.lrv">
  <persName>Wernimont, Jacqueline</persName>
  <resp>encoder</resp>
 </respStmt>
</titleStmt>
There is however no requirement that the respStmt be used for this person, or that the elements indicated be contained within the same document. A project might for example maintain a separate document listing all of its personnel in which they were represented using the person element described in 15.2.2 The Participant Description.

2.6 Minimal and Recommended Headers

The TEI header allows for the provision of a very large amount of information concerning the text itself, its source, its encodings, and revisions of it, as well as a wealth of descriptive information such as the languages it uses and the situation(s) in which it was produced, together with the setting and identity of participants within it. This diversity and richness reflects the diversity of uses to which it is envisaged that electronic texts conforming to these Guidelines will be put. It is emphatically not intended that all of the elements described above should be present in every TEI Header.

The amount of encoding in a header will depend both on the nature and the intended use of the text. At one extreme, an encoder may expect that the header will be needed only to provide a bibliographic identification of the text adequate to local needs. At the other, wishing to ensure that their texts can be used for the widest range of applications, encoders will want to document as explicitly as possible both bibliographic and descriptive information, in such a way that no prior or ancillary knowledge about the text is needed in order to process it. The header in such a case will be very full, approximating to the kind of documentation often supplied in the form of a manual. Most texts will lie somewhere between these extremes; textual corpora in particular will tend more to the latter extreme. In the remainder of this section we demonstrate first the minimal, and next a commonly recommended, level of encoding for the bibliographic information held by the TEI header.

Supplying only the minimal level of encoding required, the TEI header of a single text might look like the following example:
<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>Thomas Paine: Common sense, a
       machine-readable transcript</title>
   <respStmt>
    <resp>compiled by</resp>
    <name>Jon K Adams</name>
   </respStmt>
  </titleStmt>
  <publicationStmt>
   <distributor>Oxford Text Archive</distributor>
  </publicationStmt>
  <sourceDesc>
   <bibl>The complete writings of Thomas Paine, collected and edited
       by Phillip S. Foner (New York, Citadel Press, 1945)</bibl>
  </sourceDesc>
 </fileDesc>
</teiHeader>

The only mandatory component of the TEI Header is the fileDesc element. Within this, titleStmt, publicationStmt, and sourceDesc are all required constituents. Within the title statement, a title is required, and an author should be specified, even if it is unknown, as should some additional statement of responsibility, here given by the respStmt element. Within the publicationStmt, a publisher, distributor, or other agency responsible for the file must be specified. Finally, the source description should contain at the least a loosely structured bibliographic citation identifying the source of the electronic text if (as is usually the case) there is one.

We now present the same example header, expanded to include additionally recommended information, adequate to most bibliographic purposes, in particular to allow for the creation of an AACR2-conformant bibliographic record. We have also added information about the encoding principles used in this (imaginary) encoding, about the text itself (in the form of Library of Congress subject headings), and about the revision of the file.
<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>Common sense, a machine-readable transcript</title>
   <author>Paine, Thomas (1737-1809)</author>
   <respStmt>
    <resp>compiled by</resp>
    <name>Jon K Adams</name>
   </respStmt>
  </titleStmt>
  <editionStmt>
   <edition>
    <date>1986</date>
   </edition>
  </editionStmt>
  <publicationStmt>
   <distributor>Oxford Text Archive.</distributor>
   <address>
    <addrLine>Oxford University Computing Services,</addrLine>
    <addrLine>13 Banbury Road,</addrLine>
    <addrLine>Oxford OX2 6RB,</addrLine>
    <addrLine>UK</addrLine>
   </address>
  </publicationStmt>
  <notesStmt>
   <note>Brief notes on the text are in a
       supplementary file.</note>
  </notesStmt>
  <sourceDesc>
   <biblStruct>
    <monogr>
     <editor>Foner, Philip S.</editor>
     <title>The collected writings of Thomas Paine</title>
     <imprint>
      <pubPlace>New York</pubPlace>
      <publisher>Citadel Press</publisher>
      <date>1945</date>
     </imprint>
    </monogr>
   </biblStruct>
  </sourceDesc>
 </fileDesc>
 <encodingDesc>
  <samplingDecl>
   <p>Editorial notes in the Foner edition have not
       been reproduced. </p>
   <p>Blank lines and multiple blank spaces, including paragraph
       indents, have not been preserved. </p>
  </samplingDecl>
  <editorialDecl>
   <correction status="high" method="silent">
    <p>The following errors
         in the Foner edition have been corrected:
    <list>
      <item>p. 13 l. 7 cotemporaries contemporaries </item>
      <item>p. 28 l. 26 [comma] [period] </item>
      <item>p. 84 l. 4 kin kind </item>
      <item>p. 95 l. 1 stuggle struggle </item>
      <item>p. 101 l. 4 certainy certainty </item>
      <item>p. 167 l. 6 than that </item>
      <item>p. 209 l. 24 publshed published </item>
     </list>
    </p>
   </correction>
   <normalization>
    <p>No normalization beyond that performed
         by Foner, if any. </p>
   </normalization>
   <quotation marks="all" form="std">
    <p>All double quotation marks
         rendered with ", all single quotation marks with
         apostrophe. </p>
   </quotation>
   <hyphenation eol="none">
    <p>Hyphenated words that appear at the
         end of the line in the Foner edition have been reformed.</p>
   </hyphenation>
   <stdVals>
    <p>The values of <att>when-iso</att> on the <gi>time</gi>
         element always end in the format <val>HH:MM</val> or
    <val>HH</val>; i.e., seconds, fractions thereof, and time
         zone designators are not present.</p>
   </stdVals>
   <interpretation>
    <p>Compound proper names are marked. </p>
    <p>Dates are marked. </p>
    <p>Italics are recorded without interpretation. </p>
   </interpretation>
  </editorialDecl>
  <classDecl>
   <taxonomy xml:id="lcsh">
    <bibl>Library of Congress Subject Headings</bibl>
   </taxonomy>
   <taxonomy xml:id="lc">
    <bibl>Library of Congress Classification</bibl>
   </taxonomy>
  </classDecl>
 </encodingDesc>
 <profileDesc>
  <creation>
   <date>1774</date>
  </creation>
  <langUsage>
   <language ident="en" usage="100">English.</language>
  </langUsage>
  <textClass>
   <keywords scheme="#lcsh">
    <term>Political science</term>
    <term>United States -- Politics and government —
         Revolution, 1775-1783</term>
   </keywords>
   <classCode scheme="#lc">JC 177</classCode>
  </textClass>
 </profileDesc>
 <revisionDesc>
  <change when="1996-01-22" who="#MSM"> finished proofreading </change>
  <change when="1995-10-30" who="#LB"> finished proofreading </change>
  <change notBefore="1995-07-04" who="#RG"> finished data entry at end of term </change>
  <change notAfter="1995-01-01" who="#RG"> began data entry before New Year 1995 </change>
 </revisionDesc>
</teiHeader>

Many other examples of recommended usage for the elements discussed in this chapter are provided here, in the reference index and in the associated tutorials.

2.7 Note for Library Cataloguers

A strong motivation in preparing the material in this chapter was to provide in the TEI file header a viable chief source of information for cataloguing computer files. The file header is not a library catalogue record, and so will not make all of the distinctions essential in standard library work. It also includes much information generally excluded from standard bibliographic descriptions. It is the intention of the developers, however, to ensure that the information required for a catalogue record be retrievable from the TEI file header, and moreover that the mapping from the one to the other be as simple and straightforward as possible. Where the correspondence is not obvious, it may prove useful to consult one of the works which were influential in developing the content of the TEI file header. These include:
ISBD(G)
The International Standard Book Description (General) is an international standard setting out what information should be recorded in a description of a bibliographical item. There are also separate ISBDs covering different types of material, e.g. ISBD(M) for monographs, ISBD(ER) for electronic resources. These separate ISBDs follow the same general scheme as the main ISBD(G), but provide appropriate interpretations for the specific materials under consideration.
AACR2
The Anglo-American Cataloguing Rules, Second Edition, 2002 Revision: 2005 Update are the official guidelines for the construction of catalogues in general libraries in the English-speaking world. Other national cataloguing codes exist as well. AACR2 is explicitly based on the general framework of the ISBD(G) and the subsidiary ISBDs: it gives a description of how to describe bibliographic items and how to create access points such as subject or name headings and uniform titles. Other national standards include NF Z 44 Regeln für die alphabetische Katalogisierung (RAK), Regole italiane di catalogazione per autori (RICA), and ГОСТ 7.1.
ANSI/NISO Z.39.29
ANSI/NISO Z.39.29 is an American national standard governing bibliographic references for use in bibliographies, end-of-work lists, references in abstracting and indexing publications, and outputs from computerized bibliographic data bases. This standard is currently (2010) under period review. The related ISO standard is ISO 690. Other relevant national standards include BS 1629:1989, BS 5605:1978, BS 6371:1983. DIN 1505-2, and ГОСТ 7.0.5.
Since the TEI file description elements are based on the ISBD areas, it should be possible to use the content of file description as the basis for a catalog record for a TEI document. However, cataloguers should be aware that the permissive nature of the TEI Guidelines may lead to divergences between practice in using the TEI file description and the comparatively strict recommendaations of AACR2. Such divergences as the following may preclude automatic generation of catalogue records from TEI headers:
  • The TEI title statement may not categorise constituent titles in the same way as recommended by AACR2.
  • The TEI title statement contains authors, editors, and other responsible parties in separate elements, with names which may not have been normalized; it does not necessarily contain a single statement of responsibility from the chief source of information.
  • The TEI header does not require use of a particular vocabulary for subject headings or mandate the use of subject headings.

2.8 The TEI Header Module

Contents « 1 The TEI Infrastructure » 3 Elements Available in All TEI Documents

[English] [Deutsch] [Español] [Italiano] [Français] [日本語] [한국어] [中文]



Copyright TEI Consortium 2010 Licensed under the GPL. Copying and redistribution is permitted and encouraged.
Version 1.9.0. Last updated on February 25th 2011.This page generated on 2011-02-25T10:57:08Z