3 Elements Available in All TEI Documents

Table of contents

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph). A few of the elements described in this chapter (for example, bibliographic citations and lists) have a comparatively well-defined internal structure, but most of them have no consistent inner structure of their own. In the general case, they contain only a few words, and are often identifiable in a conventionally printed text by the use of typographic conventions such as shifts of font, use of quotation or other punctuation marks, or other changes in layout.

This chapter begins by describing the <p> tag used to mark paragraphs, the prototypical formal unit for running text in many TEI modules. This is followed, in section 3.2 Treatment of Punctuation, by a discussion of some specific problems associated with the interpretation of conventional punctuation, and the methods proposed by the Guidelines for resolving ambiguities therein.

The next section (section 3.3 Highlighting and Quotation) describes a number of phrase-level elements commonly marked by typographic features (and thus well-represented in conventional markup languages). These include features commonly marked by font shifts (section 3.3.2 Emphasis, Foreign Words, and Unusual Language) and features commonly marked by quotation marks (section 3.3.3 Quotation) as well as such features as terms, cited words, and glosses (section 3.3.4 Terms, Glosses, Equivalents, and Descriptions).

Section 3.4 Simple Editorial Changes introduces some phrase-level elements which may be used to record simple editorial interventions, such as emendation or correction of the encoded text. The elements described here constitute a simple subset of the full mechanisms for encoding such information (described in full in chapter 11 Representation of Primary Sources), which should be adequate to most commonly encountered situations.

The next section (section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses) describes several phrase-level and inter-level elements which, although often of interest for analysis or processing, are rarely explicitly identified in conventional printing. These include names (section 3.5.1 Referring Strings), numbers and measures (section 3.5.3 Numbers and Measures), dates and times (section 3.5.4 Dates and Times), abbreviations (section 3.5.5 Abbreviations and Their Expansions), and addresses (section 3.5.2 Addresses).

In the same way, the following section (section 3.6 Simple Links and Cross-References) presents only a subset of the facilities available for the encoding of cross-references or text-linkage. The full story may be found in chapter 16 Linking, Segmentation, and Alignment; the tags presented here are intended to be usable for a wide variety of simple applications.

Sections 3.7 Lists, and 3.8 Notes, Annotation, and Indexing, describe two kinds of quasi-structural elements: lists and notes. These may appear either within chunk-level elements such as paragraphs, or between them. Several kinds of lists are catered for, of an arbitrary complexity. The section on notes discusses both notes found in the source and simple mechanisms for adding annotations of an interpretive nature during the encoding; again, only a subset of the facilities described in full elsewhere (specifically, in chapter 17 Simple Analytic Mechanisms) is discussed.

Section 3.9 Graphics and other non-textual components introduces some simple ways of representing graphic or other non-textual content found in a text. A fuller discussion of the multimedia facilities supported by these Guidelines may be found in chapters 14 Tables, Formulæ, and Graphics and 16 Linking, Segmentation, and Alignment.

Next, section 3.10 Reference Systems, describes methods of encoding within a text the conventional system or systems used when making references to the text. Some reference systems have attained canonical authority and must be recorded to make the text useable in normal work; in other cases, a convenient reference system must be created by the creator or analyst of an electronic text.

Like lists and notes, the bibliographic citations discussed in section 3.11 Bibliographic Citations and References, may be regarded as structural elements in their own right. A range of possibilities is presented for the encoding of bibliographic citations or references, which may be treated as simple phrases within a running text, or as highly-structured components suitable for inclusion in a bibliographic database.

Additional elements for the encoding of passages of verse or drama (whether prose or verse) are discussed in section 3.12 Passages of Verse or Drama.

The chapter concludes with a technical overview of the structure and organization of the module described here. This should be read in conjunction with chapter 1 The TEI Infrastructure, describing the structure of the TEI document type definition.

3.1 Paragraphs

The paragraph is the fundamental organizational unit for all prose texts, being the smallest regular unit into which prose can be divided. Prose can appear in all TEI texts, even those that are primarily of another genre (e.g., verse); thus the paragraph is described here, as an element which can appear in any kind of text.

Paragraphs can contain any of the other elements described within this chapter, as well as some other elements which are specific to individual text types. We distinguish phrase-level elements, which must be entirely contained within a paragraph and cannot appear except within one, from chunks, which can appear between, but not within, paragraphs, and from inter-level elements, which can appear either within a single paragraph or between paragraphs. The class of phrases includes emphasized or quoted phrases, names, dates, etc. The class of inter-level elements includes bibliographic citations, notes, lists, etc. The class of chunks includes the paragraph itself, and other elements which have similar structural properties, notably the <ab> (anonymous block) element described in 16.3 Blocks, Segments, and Anchors) which may be used as an alternative to the paragraph in some kinds of texts.

Because paragraphs may appear in different base or additional tag sets, their possible contents may differ in different kinds of documents. In particular, additional elements not listed in this chapter may appear in paragraphs in certain kinds of text. However, the elements described in this chapter are always by default available in all kinds of text.

The paragraph is marked using the <p> element:
  • p (paragraph) marks paragraphs in prose.

If a consistent internal subdivision of paragraphs is desired, the <s> or <seg> (‘segment’) elements may be used, as discussed in chapters 16 Linking, Segmentation, and Alignment and 17 Simple Analytic Mechanisms respectively. More usually, however, paragraphs have no firm internal structure, but contain prose encoded as a mix of characters, entity references, phrases marked as described in the rest of this chapter, and embedded elements like lists, figures, or tables.

Since paragraphs are usually explicitly marked in Western texts, typically by indentation, the application of the <p> tag usually presents few problems.

In some cases, the body of a text may comprise but a single paragraph:
<body>
 <p>I fully appreciate Gen. Pope's splendid achievements with their
   invaluable results; but you must know that Major Generalships in the
   Regular Army, are not as plenty as blackberries.</p>
</body>
This news story shows typically short journalistic paragraphs:
<head>SARAJEVO, Bosnia and Herzegovina, April 19</head>
<p>Serbs seized more territory in this struggling new country today as
the United States Air Force ended a two-day airlift of humanitarian
aid into the capital, Sarajevo.</p>
<p>International relief workers called on European Community nations
to step up their humanitarian aid to the former Yugoslav republic,
in conjunction with new American aid flights if necessary.</p>
<p>A special envoy from the European Community, Colin Doyle, harshly
condemned the decision by Serbs to shell Sarajevo on Saturday night
during a visit to the Bosnian capital by a senior American official,
Deputy Assistant Secretary of State Ralph R. Johnson.</p>
<p>...</p>
The following extract from a Russian fairy tale demonstrates how other phrase level elements (in this case <q> elements representing direct speech; see section 3.3.3 Quotation) may be nested within, but not across, paragraphs:
<p>A fly built a castle, a tall and mighty castle.
There came to the castle the Crawling Louse. <q>Who,
   who's in the castle? Who, who's in your house?</q>
said the Crawling Louse. <q>I, I, the Languishing Fly.
   And who art thou?</q>
 <q>I'm the Crawling Louse.</q>
</p>
<p>Then came to the castle the Leaping Flea. <q>Who,
   who's in the castle?</q> said the Leaping Flea. <q>I,
   I, the Languishing Fly, and I, the Crawling Louse. And
   who art thou?</q>
 <q>I'm the Leaping Flea.</q>
</p>
<p>Then came to the castle the Mischievous Mosquito.
<q>Who, who's in the castle?</q> said the Mischievous
Mosquito. <q>I, I, the Languishing Fly, and I, the
   Crawling Louse, and I, the Leaping Flea. And who art
   thou?</q>
 <q>I'm the Mischievous Mosquito.</q>
</p>

3.2 Treatment of Punctuation

Punctuation marks cause problems for text markup when they are not available in the character set used and when they are significantly ambiguous. To a large extent, the availability of the Unicode character set addresses most such problems, since it provides specific code points for most punctuation marks, and also distinguishes glyphs (such as stop, comma, and hyphen) which are used with different functions. Thus, for example, different Unicode code points are available for the hyphen used as a minus sign, as a word breaking hyphen, as a soft hyphen, or as a ‘non-breaking’ hyphen. The facilities described in chapter 5 Representation of Non-standard Characters and Glyphs may also be used to define markup for non-standard punctuation characters.

Full stop (period) may mark (orthographic) sentence boundaries, abbreviations, decimal points, or serve as a visual aid in printing numbers. These usages can be distinguished by tagging S-units, abbreviations, and numbers, as described in sections 16.3 Blocks, Segments, and Anchors, 3.5.5 Abbreviations and Their Expansions, and 3.5.3 Numbers and Measures. However, there are independent reasons for tagging these, whether or not they are marked by full stops, and the polysemy of the full stop itself is perhaps no different from that of any character in the writing system.

Question mark and exclamation mark typically mark the end of orthographic sentences, but may also be used as a mid-sentence comment by the author (! to express surprise or some other strong feeling, ? to query a word or expression or mark a sentence as dubious in linguistic discussion). These uses may be distinguished by marking S-units, in which case the mid-sentence uses of these punctuation marks may be left unmarked, or tagged using the <c> element discussed in 17.1 Linguistic Segment Categories.

Dashes are used for a variety of purposes: insertion, interruption, new speaker (in dialogue), list item. In the latter two cases it is preferable to mark the underlying feature using the elements <q> or <item>, on which see section 3.3.3 Quotation, and section 3.7 Lists, respectively.

Quotation marks may be removed from text contained by <q> or <quote> elements, especially as quotations are not always marked by quotation marks (notably long quotations) or may be marked in a variety of ways; see the discussion of quotation and related features in section 3.3.3 Quotation.

Apostrophes must be distinguished from single quote marks. As with hyphens, this disambiguation may be performed by selecting an appropriate Unicode character, but it may also be represented by using explicit XML tags for quotations as suggested above. However, apostrophes have a variety of uses. In English they mark contractions, genitive forms, and (occasionally) plural forms. Full disambiguation of these uses belongs to the level of linguistic analysis and interpretation.

Parentheses and other marks of suspension such as dashes or ellipses are often used to signal information about the syntactic structure of a text fragment. Full disambiguation of their uses also belongs to the level of linguistic analysis and interpretation, and is therefore discussed in chapter 17 Simple Analytic Mechanisms.

Where punctuation marks are disambiguated by tagging the underlying feature they signal, it may be debated whether they should be excluded or left as part of the text. In the case of quotation marks, it may be more convenient to distinguish opening from closing marks simply by using the appropriate Unicode character than to use the <q> element, with or without a rend attribute. The solution chosen will vary depending upon the feature and depending upon the purpose of the project.

3.3 Highlighting and Quotation

This section deals with a variety of textual features, all of which have in common that they are frequently realized in conventional printing practice by the use of such features as underlining, italic fonts, or quotation marks, collectively referred to here as highlighting. After an initial discussion of this phenomenon and alternate approaches to encoding it, this section describes ways of encoding the following textual features, all of which are conventionally rendered using some kind of highlighting:
  • emphasis, foreign words and other linguistically distinct uses of highlighting
  • representation of speech and thought, quotation, etc.
  • technical terms, glosses, etc.

3.3.1 What Is Highlighting?

By highlighting we mean the use of any combination of typographic features (font, size, hue, etc.) in a printed or written text in order to distinguish some passage of a text from its surroundings.9 The purpose of highlighting is generally to draw the reader's attention to some feature or characteristic of the passage highlighted; this section describes the elements recommended by these Guidelines for the encoding of such textual features.

In conventionally printed modern texts, highlighting is often employed to identify words or phrases which are regarded as being one or more of the following:
  • distinct in some way — as foreign, dialectal, archaic, technical, etc.
  • emphatic, and which would for example be stressed when spoken
  • not part of the body of the text, for example cross-references, titles, headings, labels, etc.
  • identified with a distinct narrative stream, for example an internal monologue or commentary.
  • attributed by the narrator to some other agency, either within the text or outside it: for example, direct speech or quotation.
  • set apart from the text in some other way: for example, proverbial phrases, words mentioned but not used, names of persons and places in older texts, editorial corrections or additions, etc.

The textual functions indicated by highlighting may not be rendered consistently in different parts of a text or in different texts. (For example, a foreign word may appear in italics if the surrounding text is in roman, but in roman if the surrounding text is in italics.) For this reason, these Guidelines distinguish between the encoding of rendering itself and the encoding of the underlying feature expressed by it.

Highlighting as such may be encoded by using either of the global attributes rend or rendition attributes (see 1.3.1.1 Global Attributes). This allows the encoder both to specify the function of a highlighted phrase or word, by selecting the appropriate element described here or elsewhere in the Guidelines, and to further describe the way in which it is highlighted, by means of the rend attribute. If the encoder wishes to offer no interpretation of the feature underlying the use of highlighting in the source text, then the <hi> element may be used, which indicates only that the text so tagged was highlighted in some way.
  • hi (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
The <hi> element is provided by the model.hiLike class.

The possible values carried by the rend attribute are not formally defined in this version of the Guidelines. Since the rend attribute may be used to document any peculiarity of the way a given segment of text was rendered in the original source text, it may need to express a very large range of typographic features, by no means restricted to typeface, type size, etc.

Where it is both appropriate and feasible, these Guidelines recommend that the textual feature marked by the highlighting should be encoded, rather than just the simple fact of the highlighting. This is for the following reasons:
  • the same kind of highlighting may be used for different purposes in different contexts
  • the same textual function may be highlighted in different ways in different contexts
  • for analytic purposes, it is in general more useful to know the intended function of a highlighted phrase than simply that it is distinct.

In many, if not most, cases the underlying function of a highlighted phrase will be obvious and non-controversial, since the distinctions indicated by a change of highlighting correspond with distinctions discussed elsewhere in these Guidelines. The elements available to record such distinctions are, for the most part, members of the model.emphLike class. This and the model.hiLike class mentioned above constitute the model.highlighted class, which is a phrase level class. Members of this class may appear anywhere within paragraph level elements.

The distinction between the two classes is simple, and typified by the two elements <hi> and <emph>: the former marks simply that a passage is typographically distinct in some way, while the latter asserts that a passage is linguistically emphasized for some purpose. These two properties, though often combined, are not identical. It should however be recognized, however, that cases do exist in which it is not economically feasible to mark the underlying function (e.g. in the preparation of large text corpora), as well as cases in which it is not intellectually appropriate (as in the transcription of some older materials, or in the preparation of material for the study of typographic practice). In such cases, the <hi> element or some other element from the model.hiLike class should be used.

Elements which are sometimes realized by typographic distinction but which are not discussed in this section include <title> (discussed in section 3.11 Bibliographic Citations and References) and <name> (discussed in section 3.5.1 Referring Strings).

3.3.2 Emphasis, Foreign Words, and Unusual Language

This subsection discusses the following elements:
  • foreign (foreign) identifies a word or phrase as belonging to some language other than that of the surrounding text.
  • emph (emphasized) marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect.
  • distinct identifies any word or phrase which is regarded as linguistically distinct, for example as archaic, technical, dialectal, non-preferred, etc., or as forming part of a sublanguage.
These elements are all members of the model.emphLike class.
3.3.2.1 Foreign Words or Expressions
Words or phrases which are not in the main language of the text should be tagged as such, at least where the fact is indicated in the text. Where the word or phrase concerned is already distinguished from the rest of the text by virtue of its function (for example, because it is a name, a technical term, a quotation, a mentioned word, etc.) then the global xml:lang attribute should be used to specify additionally that its language distinguishes it from the surrounding text. Any element in the TEI scheme may take a xml:lang attribute, which specifies both the writing system and the language used by its content (see section vi.i Language identification for discussion of this attribute). Where there is no other applicable element, the element <foreign> may be used to provide a peg onto which the xml:lang may be attached.
<q>Aren't you confusing <foreign xml:lang="la">post hoc</foreign> with <foreign xml:lang="la">propter hoc</foreign>?</q> said the Bee Master.
<q>Wax-moth only succeed when weak bees let them in.</q>
The <foreign> element should not be used to represent foreign words which are mentioned or glossed within the text: for these use the appropriate element from section 3.3.4 Terms, Glosses, Equivalents, and Descriptions below. Compare the following example sentences:
John eats a <foreign xml:lang="fr">croissant</foreign> every morning.
<mentioned xml:lang="fr">Croissant</mentioned> is difficult to
pronounce with your mouth full.
A <term xml:lang="fr">croissant</term> is a crescent-shaped
piece of light, buttery, pastry that is usually eaten for
breakfast, especially in France.
3.3.2.2 Emphatic Words and Phrases
The <emph> element is provided to mark words or phrases which are linguistically emphatic or stressed. Text which is only typographically ‘emphasized’ falls into the class of highlighted text, and may be tagged with the <hi> element. In printed works, emphasis is generally indicated by devices such as the use of an italic font, a large typeface, or extra wide letter spacing; in manuscripts and typescripts, it is usually indicated by the use of underlining. As the following examples demonstrate, an encoder may choose whether or not to make explicit the particular type of rendition associated with the emphasis by use of the rend attribute. If a source text consistently renders a particular feature (e.g. emphasis or words in foreign languages) in a particular way, the rendering associated with that feature may be described in the TEI header using the <rendition> element. The rend attribute may then be used to describe examples which deviate from the norm. For example, assuming that the TEI Header has defined a default rendering for the <emph> element, the following encoding would use it:
<q>Sex, sir, is <emph>purely</emph> a
question of appetite!</q> Tarr exclaimed.
If on the other hand no such default has been defined for the element, the encoder may specify it informally using the rend attribute:
<q>What it all comes to is this,</q> he said.

<q>
 <emph rend="italic">What does Christopher
   Robin do in the morning nowadays?</emph>
</q>
or, if a <rendition> element has been provided in the header (but not necessarily associated with any other element), the rendition attribute may be used to point to it:
<l>Here Thou, great <name rend="italics">Anna</name>!
whom three Realms obey,</l>
<l>Doth sometimes Counsel take —
and sometimes <emph rendition="#italic">Tea</emph>.</l>
<!-- in the header ... -->
<rendition xml:id="italicscheme="css">text-style:italic</rendition>
Further information on the use of the <rendition> element is provided at 2.3.4 The Tagging Declaration.

The <hi> element is used to mark words or phrases which are highlighted in some way, but for which identification of the intended distinction is difficult, controversial, or impossible. It enables an encoder simply to record the fact of highlighting, possibly describing it by the use of a rend or rendition attribute, as discussed above, without however taking a position as to the function of the highlighting. This may also be useful if the text is to be processed in two stages: representing simply typographic distinctions during a first pass, and then replacing the <hi> elements with more specific elements in a second pass.

Some simple examples:
<hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,
in consideration of the said intended marriage ...
In this example, the first highlighted phrase uses black letter or gothic print to mimic the appearance of a legal document, and italic to mark Walter Shandy as a name. In a second pass, the elements <head> or <label> might be appropriate for the first use, and the element <name> for the second.
The heaviest rain, and snow, and hail, and sleet, could
boast of the advantage over him in only one respect. They
often <hi rend="quoted">came down</hi> handsomely, and
Scrooge never did.
In this example, the phrase came down uses inverted commas to indicate a play on words.10 In a second pass, the element <soCalled> might be preferred.
3.3.2.3 Other Linguistically Distinct Material

For some kinds of analysis, it may be desirable to encode the linguistic distinctiveness of words and phrases with more delicacy than is allowed by the <foreign> element. The <distinct> element is provided for this purpose. Its attributes allow for additional information characterizing the nature of the linguistic distinction to be made in two distinct ways: the type attribute simply assigns a user-defined code of some kind to the word or phrase which assigns it to some register, sub-language, etc. No recommendations as to the set of values for this attribute are provided at this time, as little consensus exists in the field.

Alternatively, the remaining three attributes may be used in combination to place a word or phrase on a three-dimensional scale sometimes used in descriptive linguistics, as for example in Mattheier et al, 1988. The time attribute places a word diachronically, for example as archaic, old-fashioned, contemporary, futuristic, etc.; the space attribute places a word diatopically, that is, with respect to a geographical classification, for example as national, regional, international, etc.; the social attribute places a word diastatically, that is, with respect to a social classification, for example as technical, polite, impolite, restricted, etc. Again, no recommendations are made for the values of these attributes at this time; the encoder should provide a description of the scheme used in the appropriate section of the header (see section 2.3 The Encoding Description).

Examples:
Next morning a boy in that dormitory confided to his
bosom friend, a <distinct type="psSlang">fag</distinct> of
Macrea's, that there was trouble in their midst which
King <distinct type="archaic">would fain</distinct> keep
secret.
Next morning a boy in that dormitory confided to his
bosom friend, a
<distinct time="1900space="GBsocial="publicschool">fag</distinct>
of Macrea's, that there was trouble in their midst which
King <distinct time="archaic">would fain</distinct> keep
secret.
Where more complex (or more rigorous) interpretive analyses of the associations of a word are required, the more detailed and general mechanisms described in chapter 18 Feature Structures should be preferred to these simple characterizations. It may also be preferable to record the kinds of analysis suggested here by means of the simple annotation element <note> described in section 3.8 Notes, Annotation, and Indexing, or the <span> element described in section 17.3 Spans and Interpretations.

3.3.3 Quotation

One form of presentational variation found particularly frequently in written and printed texts is the use of quotation marks. As with the typographic variations discussed in the preceding section, it is generally helpful to separate the encoding of the underlying textual feature (for example, a quotation or a piece of direct speech) from the encoding of its rendering (for example, the use of a particular style of quotation marks).

This section discusses the following elements, all of which are often rendered by the use of quotation marks:
  • q (separated from the surrounding text with quotation marks) contains material which is marked as (ostensibly) being somehow different than the surrounding text, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned but not used.
  • said (speech or thought) indicates passages thought or spoken aloud, whether explicitly indicated in the source or not, whether directly or indirectly reported, whether by real people or fictional characters.
    directmay be used to indicate whether the quoted matter is regarded as direct or indirect speech.
    aloudmay be used to indicate whether the quoted matter is regarded as having been vocalized or signed.
  • quote (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
  • cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
  • mentioned marks words or phrases mentioned, not used.
  • soCalled contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics.
The elements <mentioned> and <soCalled> are members of the class model.emph; the <q> and <said> are members of the class model.qLike in their own right, while <cit> and <quote> are members of model.quoteLike, a subclass of model.qLike. This class is a subclass of model.inter; hence all of these elements are permitted both within and between paragraph-level elements.

The most common and important use of quotation marks is, of course, to mark quotation, by which we mean simply any part of the text attributed by the author or narrator to some agency other than the narrative voice. The <q> element may be used if no further distinction beyond this is judged necessary. If however it is felt necessary to distinguish passages which are in some sense external to the work from passages of direct speech or thought, a more precise element may be chosen from the list above. Typical examples include passages cited from other works, for which the element <quote> may be used, and words or phrases spoken or thought by people or characters within the current work, for which the element <said> may be used. The <soCalled> element is used for cases where the author or narrator distances him or herself from the words in question without however attributing them to any other voice in particular. The <mentioned> element is appropriate for a case where a word or phrase is being discussed in the body of a text rather than forming part of the text directly.

As noted above, if the distinction among these various reasons why a passage is offset from surrounding text cannot be made reliably, or is not of interest, then all quoted matter may simply be marked using the <q> element.

Quotation may be indicated in a printed source by changes in type face, by special punctuation marks (single or double or angled quotes, dashes, etc.) and by layout (indented paragraphs, etc.). If these characteristics are of interest, one or other of the global rend or rendition attributes discussed in section 1.3.1.1 Global Attributes may be used to record them.

Quotation marks themselves may, like other punctuation marks, be felt for some purposes to be worth retaining within a text, quite independently of their description by the rend attribute. This should generally be done using the appropriate Unicode character, or, if this is not possible, a numeric character reference (see Character References).

Alternatively, the encoder may suppress all quotation marks, possibly recording their form using some appropriate set of conventions in the rend attribute. Some examples are shown below:
<said rend="PRE lsquo POST rsquo">Who-e debel
you?</said> — he at last said —
<said rend="PRE lsquo POST rsquo">you no speak-e,
damme, I kill-e.</said> And so saying,
the lighted tomahawk began flourishing
about me in the dark.
Adolphe se tourna vers lui :
<said>— Alors, Albert, quoi de neuf?</said>
<said>— Pas grand-chose.</said>
<said>— Il fait beau,</said> dit Robert.
Adolphe se tourna vers lui :
<said rend="PRE mdash">Alors,
Albert, quoi de neuf ?</said>
<said rend="PRE mdash">Pas grand-chose.</said>
<said rend="PRE mdash">Il fait beau,</said>
dit Robert.
As members of the att.ascribed class, elements <said> and <q> share the following attribute:
  • att.ascribed provides attributes for elements representing speech or action that can be ascribed to a specific individual.
    whoindicates the person, or group of people, to whom the element content is ascribed.
This may be used to make explicit who is speaking:
Adolphe se tourna vers lui :
<said who="#Adolphe">— Alors, Albert,
quoi de neuf?</said>
<said who="#Albert">— Pas grand-chose.</said>
<said who="#Robert">— Il fait beau,</said>
dit Robert.

<!-- .... -->
<list type="speakers">
 <item xml:id="Adolphe"/>
 <item xml:id="Albert"/>
 <item xml:id="Robert"/>
</list>
The who attribute may be supplied whether or not an indication of the speaker is given explicitly in the text. It may take the form (as above) of a normalized form of the speaker's name, but its role is to act as a pointer to a location elsewhere in the text where data about each speaker may be supplied. The most appropriate place to place such information is within the participant description component of the TEI Header, as further discussed in 15.2.2 The Participant Description but for simple cases like the above, a simple list of speakers located in the front or back matter of the text may suffice.
It may also be useful to distinguish representations of speech from representations of thought, in modern printed texts often indicated by a change of typeface. The aloud attribute is provided for this purpose, as in this example:
<said aloud="true">Oh yes,</said> said Henry,
<said aloud="false">I mean
Gordon Macrae, for example…</said>
<said aloud="false">Jungian
Analyst with Winebox! That's what you called him, you callous bastard,
didn't you? Eh? Eh?</said>
Quoted matter may be embedded within quoted matter, as when one speaker reports the speech of another:
<said who="#Wilson">Spaulding, he came down into the office just this day
eight weeks with this very paper in his hand, and he says:—
<said who="#WilsonSpaulding">I wish to the Lord, Mr. Wilson, that I was a
   red-headed man.</said>
</said>
<!-- ... -->
<list type="speakers">
 <item xml:id="Wilson">Wilson</item>
 <item xml:id="WilsonSpaulding">Spaulding reported by Wilson</item>
<!-- ...-->
</list>
Direct speech nested in this way is treated in the same way as elsewhere: a change of rendition may occur, but the same element should be used. An encoder may however choose to distinguish between direct speech which contains quotations from extra-textual matter and direct speech itself, as in the following example:
<p>
 <said>The Lord! The Lord! It is Sakya Muni himself,</said> the lama half
sobbed; and under his breath began the wonderful Buddhist
invocation:-<said>
  <quote>
   <l>To Him the Way — the Law — Apart —</l>
   <l>Whom Maya held beneath her heart</l>
   <l>Ananda's Lord — the Bodhisat</l>
  </quote>
   And He is here! The Most Excellent Law is here also. My
   pilgrimage is well begun. And what work! What work!</said>
</p>
Quotations from other works are often accompanied by a reference to their source. The <cit> element may be used to group together the quotation and its associated bibliographic reference, which should be encoded using the elements for bibliographic references discussed in section 3.11 Bibliographic Citations and References, as in the following example.
<div xml:id="mm01type="chapter">
 <head>Chapter 1</head>
 <epigraph>
  <cit>
   <quote>
    <l>Since I can do no good because a woman</l>
    <l>Reach constantly at something that is near it.</l>
   </quote>
   <bibl>
    <title>The Maid's Tragedy</title>
    <author>Beaumont and Fletcher</author>
   </bibl>
  </cit>
 </epigraph>
 <p>Miss Brooke had that kind of beauty which seems to be thrown into
   relief by poor dress...</p>
</div>
Like other bibliographic references, the citation attached to a quotation may be represented simply by a pointer, as in this example:
Lexicography has shown little sign of being affected by the
work of followers of J.R. Firth, probably best summarized
in his slogan, <cit>
 <quote>You shall know a word by the company it keeps.</quote>
 <ref>(Firth, 1957)</ref>
</cit>
Unlike most of the other elements discussed in this chapter, direct speech and quotations may frequently contain other high-level elements such as paragraphs or verse lines, as well as being themselves contained by such elements. Three possible solutions exist for this well-known structural problem:
  • the quotation is broken into segments, each of which is entirely contained within a paragraph
  • the quotation is marked up using stand-off markup
  • the quotation boundaries are represented by empty segment boundary delimiter elements
For further discussion and several examples, see chapter 20 Non-hierarchical Structures.
Finally, in this section, the element <soCalled> is provided for all cases in which quotation marks are used to distance the quoted text from the narrator or speaker. Common examples include the ‘scare’ quotes often found in newspaper headlines and advertising copy, where the effect is to cast doubts on the veracity of an assertion:
<head>PM dodges <soCalled>election threat</soCalled> in interview</head>
The same element should be used to mark a variety of special ironic usages. Some further examples follow:
He hated <soCalled>good</soCalled> books.
<soCalled>Croissants</soCalled> indeed! toast not good enough for you?
Although Chomsky's decision that all NL
sentences are finite objects was never justified by arguments from
the attested properties of NLs, it did have a certain
<soCalled>social</soCalled> justification. It was commonly assumed in
works on logic until fairly recently that the notion
<mentioned>language</mentioned> is necessarily restricted to finite
strings.

3.3.4 Terms, Glosses, Equivalents, and Descriptions

This section describes a set of textual elements which are used to provide a gloss, alternate identification, or description of something.

Technical terms are often italicized or emboldened upon first mention in printed texts; an explanation or gloss is sometimes given in quotation marks. Linguistic analyses conventionally cite words in languages under discussion in italics, providing a gloss immediately following marked with single quotation marks. Other texts in which individual words or phrases are mentioned (for example, as examples) rather than used may mark them either with italics or with quotation marks, and will gloss them less regularly.
  • term contains a single-word, multi-word, or symbolic designation which is regarded as a technical term.
  • gloss identifies a phrase or word used to provide a gloss or definition for some other word or phrase.
These elements are also members of the class model.emph.

A <term> may appear with or without a gloss, as may a <mentioned> element. Where the <gloss> is present, it may be linked to the term it is glossing by means of its target attribute. To establish such a link, the encoder should give an xml:id value to the <term> or <mentioned> element and provide that id as the value of the target attribute on the <gloss> element. The following examples demonstrate this facility: for more discussion of this and other kinds of linkage within TEI documents, see chapter 16 Linking, Segmentation, and Alignment.

Examples:
We may define <term xml:id="TDPvrend="sc">discoursal point of view</term>
as
<gloss target="#TDPv">the relationship, expressed through discourse
structure, between the implied author or some other addresser,
and the fiction.</gloss>
<gloss rend="unmarkedtarget="#PRSR">A computational device that infers
structure from grammatical strings of words</gloss> is known as a
<term xml:id="PRSR">parser</term>, and much of the history of NLP over the
last 20 years has been occupied with the design of parsers.
There is thus a striking accentual difference between a verbal
form like <mentioned xml:id="cw234xml:lang="grc">eluthemen</mentioned>
<gloss target="#cw234">we were released,</gloss> accented on the
second syllable of the word, and its participial derivative

<mentioned xml:id="cw235xml:lang="grc">lutheis</mentioned>
<gloss target="#cw235">released,</gloss> accented on the last.
Another group of elements is used to supply different kinds of names for objects described by the TEI. Examples of this are documentation of elements, attributes, classes (and also attribute values where appropriate), and description of glyphs.
  • altIdent (alternate identifier) supplies the recommended XML name for an element, class, attribute, etc. in some language.
  • desc (description) contains a brief description of the purpose and application for an element, attribute, or attribute value.
  • equiv/ (equivalent) specifies a component which is considered equivalent to the parent element, either by co-reference, or by external link.
    uri(uniform resource identifier) references the underlying concept of which the parent is a representation by means of some external identifier
    filterreferences an external script which contains a method to transform instances of this element to canonical TEI
    namenames the underlying concept of which the parent is a representation
Along with the <gloss> element mentioned above, these elements constitute the model.glossLike class.
The <gloss> element may be used to provide a brief explanation for the name of the object if this is not self-explanatory. For example, the specification for the element <ab> used to mark arbitrary blocks of text begins as follows:
<elementSpec module="linkingident="ab">
 <gloss>anonymous block</gloss>
<!--... -->
</elementSpec>
A <gloss> may also be supplied for an attribute name or an attribute value in similar circumstances:
<valList type="open">
 <valItem ident="susp">
  <gloss>suspension</gloss>
  <desc>the abbreviation provides the first letter(s)
     of the word or phrase, omitting the remainder.</desc>
 </valItem>
 <valItem ident="contr">
  <gloss>contraction</gloss>
  <desc>the abbreviation omits some letter(s) in the middle.</desc>
 </valItem>
<!--...-->
</valList>
Note that this is quite distinct from the use of the <desc> element, which contains a full description of the intended semantics for the object.
The <equiv> element is used to document equivalencies between the concept represented by this object and the same concept as described in other schemes or ontologies. The uri attribute is used to supply a pointer to some location where such external concepts are defined. For example, to indicate that the TEI <death> element corresponds to the concept defined by the CIDOC CRM category E69, the declaration for the former might begin as follows:
<elementSpec module="namesdatesident="death">
 <equiv name="E69uri="http://cidoc.ics.forth.gr/"/>
<!--... -->
</elementSpec>
The <equiv> element may also be used to map newly-defined elements onto existing constructs in the TEI, using the filter and name attributes to point to an implementation of the mapping. This is useful when a TEI customization (see 23.2 Personalization and Customization) defines ‘shortcuts’ for convenience of data entry or markup readability. For example, suppose that in some TEI customization an element <bo> has been defined which is conceptually equivalent to the standard markup construct hi rend='bold'. The following declarations would additionally indicate that instances of the <bo> element can be converted to canonical TEI by obtaining a filter from the URI specified, and running the procedure with the name bold. The mimeType attribute specifies the language (in this case XSL) in which the filter is written:
<elementSpec ident="bons="http://www.example.org/ns/notTEI">
 <equiv
   filter="http://www.example.com/equiv-filter.xsl"
   mimeType="text/xsl"
   name="bold"/>

 <gloss>bold</gloss>
 <desc>contains a sequence of characters rendered in a bold face.</desc>
<!-- ... -->
</elementSpec>
The <altIdent> element is used to provide an alternative name for an object, for example using a different natural language. Thus, the following might be used to indicate that the <abbr> element should be identified using the German word Abkürzung:
<elementSpec ident="abbrmode="change">
 <altIdent xml:lang="de">Abkürzung</altIdent>
<!--...-->
</elementSpec>
In the same way, the following specification for the <graphic> element indicates that the attribute url may also be referred to using the alternate identifier href:
<elementSpec ident="graphicmode="change">
 <attList>
  <attDef mode="changeident="url">
   <altIdent>href</altIdent>
  </attDef>
<!-- .... -->
 </attList>
</elementSpec>

By default, the <altIdent> of a component is identical to the value of its ident attribute.

The contents of the <desc> element provide a brief characterization of the intended function of the object being documented in a form that permits its quotation out of context, as in the following example:
<elementSpec module="coreident="foreign">
<!--... -->
 <desc>identifies a word or phrase as belonging to some language other
   than that of the surrounding text. </desc>
<!--... -->
</elementSpec>
By convention, a <desc> element begins with a verb such as contains, indicates, specifies, etc. and contains a single clause.

3.3.5 Some Further Examples

As a simple example of the elements discussed here, consider the following sentence:

On the one hand the Nibelungenlied is associated with the new rise of romance of twelfth-century France, the romans d'antiquité, the romances of Chrétien de Troyes, and the German adaptations of these works by Heinrich van Veldeke, Hartmann von Aue, and Wolfram von Eschenbach.

A first approximation to the encoding of this sentence might be simply to record the fact that the phrases printed above in italics are highlighted, as follows:
On the one hand the <hi rend="italic">Nibelungenlied</hi> is
associated with the new rise of romance of twelfth-century France,
the <hi xml:lang="frrend="italic">romans d'antiquité</hi>,
the romances of Chrétien de Troyes, ...
This encoding would, however, lose the important distinction between an italicized title and an italicized foreign phrase. Many other phrases might also be italicized in the text, and a retrieval program seeking to identify foreign terms (for example) would not be able to produce reliable results by simply looking for italicized words. Where economic and intellectual constraints permit, therefore, it would be preferable to encode both the function of the highlighted phrases and their appearance, as follows:
On the one hand the <title rend="italic">Nibelungenlied</title>
is associated with the new rise of romance of twelfth-century France,
the <foreign rend="italic">romans d'antiquité</foreign>, the
romances of Chrétien de Troyes, ...
In this example, the decision as to which textual features are distinguished by the highlighting is relatively uncontroversial. As a less straightforward example, consider the use of italic font in the following passage:

A pretty common case, I believe; in all vehement debatings. She says I am too witty; Anglicé, too pert; I, that she is too wise; that is to say, being likewise put into English, not so young as she has been: in short, she is grown so much into a mother, that she had forgotten she ever was a daughter. ...

Clearly, the word vehement is not italicized for the same reason as the phrase not so young as she has been; the former is emphasized, while the latter is proverbial. It also provides an ironic gloss for the words too wise, in the same way as too pert glosses too witty. The glossed phrases are not, however, technical terms or cited words, but quoted phrases, as if the writer were putting words into her own and her mother's mouths. Finally, the words mother and daughter are apparently italicized simply to oppose them in the sentence; certainly they do not fit into any of the categories so far proposed as reasons for italicizing. Note also that the word Anglicé is not italicized although it is not generally considered an English word.

The following sample encoding for the above passage attempts to take into account all the above points:
A pretty common case, I believe; in all <emph>vehement</emph>
debatings. She says I am <q rend="italic">too witty</q>;
<foreign xml:lang="larend="roman">Anglicé</foreign>,
<gloss rend="italic">too pert</gloss>; I, that she is
<q rend="italic"> too wise</q>; that is to say, being likewise
put into English, <gloss rend="italic">not so young as she has
been</gloss>: in short, she is grown so much into a
<hi rend="italic">mother</hi>, that she had forgotten she ever
was a <hi rend="italic">daughter</hi>.

3.4 Simple Editorial Changes

As in editing a printed text, so in encoding a text in electronic form, it may be necessary to accommodate editorial comment on the text and to render account of any changes made to the text in preparing it. The tags described in this section may be used to record such editorial interventions, whether made by the encoder, by the editor of a printed edition used as a copy text, by earlier editors, or by the copyists of manuscripts.

The tags described here handle most common types of editorial intervention and stereotyped comment; where less structured commentary of other types is to be included, it should be marked using the <note> element described in section 3.8 Notes, Annotation, and Indexing. Systematic interpretive annotation is also possible using the various methods described in chapter 16 Linking, Segmentation, and Alignment. The examples given here illustrate only simple cases of editorial intervention; in particular, they permit economical encoding of a simple set of alternative readings of a short span of text. To encode multiple views of large or heterogenous spans of text, the mechanisms described in chapter 16 Linking, Segmentation, and Alignment should be used. To encode multiple witnesses of a particular text, a similar mechanism designed specifically for critical editions is described in chapter 12 Critical Apparatus.

For most of the elements discussed here, some encoders may wish to indicate both a responsibility, that is, a code indicating the person or agency responsible for making the editorial intervention in question, and also an indication of the degree of certainty which the encoder wishes to associate with the intervention. Because these requirements are common to many of the elements discussed in this section, they are provided by an attribute class, called att.editLike. All members of this class carry the following optional attributes:
  • att.editLike provides attributes describing the nature of a encoded scholarly intervention or interpretation of any kind.
    cert(certainty) signifies the degree of certainty associated with the intervention or interpretation.
    resp(responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.
    evidenceindicates the nature of the evidence supporting the reliability or accuracy of the intervention or interpretation.
Many of the elements discussed here can be used in two ways. Their primary purpose is to indicate that the text encoded as the element's content represents an editorial intervention (or non-intervention) of a specific kind, indicated by the element itself. However, pairs or other meaningful groupings of such elements can also be supplied, wrapped within a special purpose <choice> element:
  • choice groups a number of alternative encodings for the same point in a text.
This element enables the encoder to represent for example a text in its ‘original’ uncorrected and unaltered form, alongside the same text in one or more ‘edited’ forms. This usage permits software to switch automatically between one ‘view’ of a text and another, so that (for example) a stylesheet may be set to display either the text in its original form or after the application of editorial interventions of particular kinds.

Elements which can be combined in this way constitute the model.choicePart class. The default members of this class are <sic>, <corr>, <reg>, <orig>, <unclear>, <add>, and <del>; their functions and usage are described further below.

Three categories of editorial intervention are discussed in this section:
  • indication or correction of apparent errors
  • indication or regularization of variant, irregular, non-standard, or eccentric forms
  • editorial additions, suppressions, and omissions

A more extended treatment of the use of these tags in transcriptional and editorial work is given in chapter 11 Representation of Primary Sources.

3.4.1 Apparent Errors

When the copy text is manifestly faulty, an encoder or transcriber may elect simply to correct it without comment, although for scholarly purposes it will often be more generally useful to record both the correction and the original state of the text. The elements described here enable all three approaches, and allows the last to be done is such a way as make it easy for software to present either the original or the correction.
  • sic (latin for thus or so) contains text reproduced although apparently incorrect or inaccurate.
  • corr (correction) contains the correct form of a passage apparently erroneous in the copy text.
The following examples show alternative treatment of the same material. The copy text reads:

Another property of computer-assisted historical research is that data modelling must permit any one textual feature or part of a textual feature to be a part of more than one information model and to allow the researcher to draw on several such models simultaneously, for example, to select from a machine-readable text those marginal comments which indicate that the date's mentioned in the main body of the text are incorrect.

An encoder may choose to correct the typographic error, either silently or with an indication that a correction has been made, as follows:
… marginal comments which indicate that the <corr>dates</corr>
mentioned in the main body of the text are incorrect.
Alternatively, the encoder may simply record the typographic error without correcting it, either without comment or with a <sic> element to indicate the error is not a transcription error in the encoding:
… marginal comments which indicate that the <sic>date's</sic>
mentioned in the main body of the text are incorrect.
If the encoder elects both to record the original source text and to provide a correction for the sake of word-search and other programs, both <sic> and <corr> are used, wrapped in a <choice>:
… marginal comments which indicate that the
<choice>
 <corr>dates</corr>
 <sic>date's</sic>
</choice> mentioned in the main body of the text are
incorrect.
The <sic> and <corr> elements can appear in either order.
If it is desired to indicate the person or edition responsible for the emendation, this might be done as follows:
… marginal comments which indicate that the
<choice>
 <corr resp="#msm">dates</corr>
 <sic>date's</sic>
</choice> mentioned in the main body of the text are
incorrect.

<!-- within the header for this document ... -->
<respStmt>
 <resp>editor</resp>
 <name xml:id="msm">C.M. Sperberg McQueen</name>
</respStmt>
Here the resp attribute has been used to indicate responsibility for the correction. Its value (#msm) is an example of the pointer values discussed in section 3.6 Simple Links and Cross-References; in this case, it points to a <name> element within the TEI Header, but any element might be indicated in this way, including for example a <person> element (if the module described in 13 Names, Dates, People, and Places has been included), or one of the bibliographic elements described in 3.11 Bibliographic Citations and References, if the correction has been taken from some other source. The resp attribute is available for all elements which are part of the att.editLike class. The same class makes available a cert attribute,which may be used to indicate the degree of editorial confidence in a particular correction, as in the following example:
An <choice>
 <corr cert="high">Autumn</corr>
 <sic>Antony</sic>
</choice> it was,
That grew the more by reaping
See further the discussion in section 11.3.3 Correction and Conjecture.

Where, as here, the correction takes the form of adding text not otherwise present in the text being encoded, the encoder should use the <corr> element. Where the correction is present in the text being encoded, and consists of some combination of visible additions and deletions, the elements <add> or <del> should be used: see further section 3.4.3 Additions, Deletions, and Omissions below. Where the correction takes the form of addition of material not present in the original because of physical damage or illegibility, the <supplied> element may be used. Where the ‘correction’ is simply a matter of expanding an abbreviation the <ex> element may be used. These and other elements to support the detailed encoding of authorial or scribal interventions of this kind are all provided by the module described in chapter 11 Representation of Primary Sources.

3.4.2 Regularization and Normalization

When the source text makes extensive use of variant forms or non-standard spellings, it may be desirable for a number of reasons to regularize it: that is, to provide ‘standard’ or ‘regularized’ forms equivalent to the non-standard forms.11

As with other such changes to the copy text, the changes may be made silently (in which case the TEI header should specify the types of silent changes made) or may be explicitly marked using the following elements:
  • reg (regularization) contains a reading which has been regularized or normalized in some sense.
  • orig (original form) contains a reading which is marked as following the original, rather than being normalized or corrected.
  • choice groups a number of alternative encodings for the same point in a text.

Typical applications for these elements include the production of editions intended for student or lay readers, linguistic research in which spelling or usage variation is not the main question at issue, production of spelling dictionaries, etc.

Consider this 16th-century text:

how godly a dede it is to overthrowe so wicked a race the world may judge: for my part I thinke there canot be a greater sacryfice to God.

An encoder may choose to preserve the original spelling of this text, but simply flag it as nonstandard by using the <orig> element with no attributes specified, as follows:
<p>...how godly a <orig>dede</orig> it is to
<orig>overthrowe</orig> so wicked a race the
world may judge: for my part I <orig>thinke</orig>
there <orig>canot</orig> be a greater
<orig>sacryfice</orig> to God</p>
Alternatively, the encoder may simply indicate that certain words have been modernized by using the <reg> element with no attributes specified, as follows:
<p>...how godly a
<reg>deed</reg> it is to <reg>overthrow</reg> so wicked a race the
world may judge: for my part I <reg>think</reg>
there <reg>cannot</reg> be a greater
<reg>sacrifice</reg> to God.</p>
Alternatively, the encoder may elect to record both old and new spellings, so that (for example) the same electronic text may serve as the basis of an old- or new-spelling edition:
<p>...how godly a <choice>
  <orig>dede</orig>
  <reg>deed</reg>
 </choice> it is to
<choice>
  <orig>overthrowe</orig>
  <reg>overthrow</reg>
 </choice> so wicked a race the
world may judge: for my part I <choice>
  <orig>thinke</orig>
  <reg>think</reg>
 </choice>
there <choice>
  <orig>canot</orig>
  <reg>cannot</reg>
 </choice> be a greater
<choice>
  <orig>sacryfice</orig>
  <reg>sacrifice</reg>
 </choice> to God.</p>
As elsewhere, the resp attribute may be used to specify the agency responsible for the regularization. For example, in the first stanza of the Old Norse poem Grógaldr, the manuscript form dura is usually regularized in modern editions to dyradoors. The manuscript's ‘vek ek þik dauðra dura’ might thus be recorded together with its regularization in two ways, as follows:
vek ek þik dauðra
<choice>
 <orig>dura</orig>
 <reg resp="#MSM">dyra</reg>
</choice>

3.4.3 Additions, Deletions, and Omissions

The following elements are used to indicate when words or phrases have been omitted from, added to, or marked for deletion from, a text. Like the other editorial elements, they allow for a wide range of editorial practices:
  • gap indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible.
    reasongives the reason for omission. Sample values include sampling, illegible, inaudible, irrelevant, cancelled, illegible.
  • unclear contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source.
    reasonindicates why the material is hard to transcribe.
  • add (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector.
  • del (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector.
Encoders may choose to omit parts of the copy text for reasons ranging from illegibility of the source or impossibility of transcribing it, to editorial policy, e.g. a systematic exclusion of poetry or prose from an encoding. The full details of the policy decisions concerned should be documented in the TEI Header (see section 2.3 The Encoding Description). Each place in the text at which omission has taken place should be marked with a <gap> element, with optionally further information about the reason for the omission, its extent, and the person or agency responsible for it, as in the following examples:
<gap reason="illegibleunit="wordsextent="2"/>
<gap reason="overwriting illegibleextent="8unit="chars"/>
The <desc> element may be used to supply a description of the material omitted, where that is considered useful:
<gap reason="samplingextent="120unit="lines">
 <desc>irrelevant commentary</desc>
</gap>
… Their arrangement with respect to Jupiter and to each other was as follows:
<gap reason="samplingextent="2unit="cm">
 <desc>astrological figure</desc>
</gap>
That is, there were two stars on the easterly side and one to the west; …

The <add> and <del> elements may be used to record where words or phrases have been added or deleted in the copy text. They are not appropriate where longer passages have been added or deleted, which span several elements; for these, the elements <addSpan> and <delSpan> described in chapter 11.3.4 Additions and Deletions must be used.

Additions to a text may be recorded for a number of reasons. Sometimes they are marked in a distinctive way in the source text, for example by brackets or insertion above the line (supralinear insertion), as in the following example, taken from a 19th century manuscript:
The story I am going to relate is true as to its main facts,
and as to the consequences <add place="supralinear">of
these facts</add> from which this tale takes its title.

The <add> element should not be used to mark editorial changes, such as supplying a word omitted by mistake from the source text or a passage present in another version. In these cases, either the <corr> or <supplied> tags should be used, as discussed above in section 3.4.1 Apparent Errors, and in section 11.3.3 Correction and Conjecture, respectively.

The <unclear> element is used to mark passages in the original which cannot be read with confidence, or about which the transcriber is uncertain for other reasons, as for example when transcribing a partially inaudible or illegible source. Its reason and resp attributes are used, as with the <gap> element, to indicate the cause of uncertainty and the person responsible for the conjectured reading.

For example:
<l>And where the sandy mountain Fenwick scald</l>
<l>
 <unclear reason="ink blot">The</unclear> sea between
yet hence his pray'r prevail'd
</l>
or from a spoken text:
<p>... and then <unclear reason="passing truck">marbled queen</unclear>...</p>

Where the material affected is entirely illegible or inaudible, the <gap> element discussed above should be used in preference.

The <del> element is used to mark material which is deleted in the source but which can still be read with some degree of confidence, as opposed to material which has been omitted by the encoder or transcriber either because it is entirely illegible or for some other reason. This is of particular importance in transcribing manuscript material, though deletion is also found in printed texts, sometimes for humorous purposes:
<l>One day I will sojourn to your shores</l>
<l>I live in the middle of England</l>
<l>But!</l>
<l>Norway! My soul resides in your watery
<del type="overstrike">fiords fyords fiiords</del>
</l>
<l>Inlets.</l>
The type attribute may be used to distinguish different methods of deletion in manuscript or typescript material, as in this line from the typescript of Eliot's Waste Land:
<l>
 <del type="overtyped">Mein</del> Frisch
<del type="overstrike">schwebt</del> weht der Wind
</l>
Deletion in manuscript or typescript is often associated with addition:
<l>
 <del type="overstrike">Inviolable</del>
 <add place="infralinear">Inexplicable</add>
splendour of Corinthian white and gold
</l>
The <subst> element discussed in 11.3.5 Substitutions provides a way of grouping additions and deletions of this kind.

The <del> element should not be used where the deletion is such that material cannot be read with confidence, or read at all, or where the material has been omitted by the transcriber or editor for some other reason. Where the material deleted cannot be read with confidence, the <unclear> tag should be used with the reason attribute indicating that the difficulty of transcription is due to deletion. Where material has been omitted by the transcriber or editor, this may be indicated by use of the <gap> element.

3.5 Names, Numbers, Dates, Abbreviations, and Addresses

This section describes a number of textual features which it is often convenient to distinguish from their surrounding text. Names, dates, and numbers are likely to be of particular importance to the scholar treating a text as source for a database; distinguishing such items from the surrounding text is however equally important to the scholar primarily interested in lexis.

The treatment of these textual features proposed here is not intended to be exhaustive: fuller treatments for names, numbers, measures, and dates are provided in the names and dates module (see chapter 13 Names, Dates, People, and Places).

3.5.1 Referring Strings

A referring string is a phrase which refers to some person, place, object, etc. Two elements are provided to mark such strings:
  • rs (referencing string) contains a general purpose name or referring string.
  • name (name, proper noun) contains a proper noun or noun phrase.
Where it is thought useful to do so, the kind of object referred to may be specified using the type attribute.
Examples include:
<p>
 <q>My dear
 <rs type="person">Mr. Bennet</rs>
 </q>, said his lady to
him one day, <q>have you heard that <rs type="place">Netherfield Park</rs> is let at last?</q>
</p>
<p>Collectors of water-rents were appointed by the
<rs type="organization">Watering Committee</rs>.
They were paid a commission not exceeding four per
cent, and gave bond.</p>
<p>It being one of the principles of the
<rs type="org">Circumlocution Office</rs> never, on any
account whatsoever, to give a straightforward answer,
<rs type="person">Mr Barnacle</rs> said, <q>Possibly.</q>
</p>
As the following example shows, the <rs> element may be used for any reference to a person, place, etc., not only to references in the form of a proper noun or noun phrase.
<p>
 <q>My dear <rs type="person">Mr. Bennet</rs>
 </q>, said
<rs type="person">his lady</rs> to him one day ...
</p>
The <name> element by contrast is provided for the special case of referencing strings which consist only of proper nouns; it may be used synonymously with the <rs> element, or nested within it if a referring string contains a mixture of common and proper nouns. The following example shows an alternative way of encoding the short sentence from Pride and Prejudice quoted above:
<p>
 <q>My dear <name type="person">Mr. Bennet</name>,</q> said <rs type="person">his lady</rs> to him one day,
<q>have you heard that <name type="place">Netherfield Park</name> is let at last?</q>
</p>
As the following example shows, a proper name may be nested within a referring string:
<rs>His Excellency the Life President, <name>Ngwazi Dr H. Kamuzu Banda</name>
</rs>

Simply tagging something as a name is generally not enough to enable automatic processing of personal names into the canonical forms usually required for reference purposes. The name as it appears in the text may be inconsistently spelled, partial, or vague. Moreover, name prefixes such as van or de la may or may not be included as part of the reference form of a name, depending on the language and country of origin of the bearer.

Two issues arise in this context: firstly, there may be a need to encode a regularised form of a name, distinct from the actual form in the source to hand; secondly, there may be a need to identify the particular person, place, etc. referred to by the name, irrespective of whether the name itself is normalized or not. The element <reg>, introduced in 3.4.2 Regularization and Normalization is provided for the former purpose; the attribute key for the latter.

The key attribute is common to all members of the att.naming class and is defined as follows:
  • att.naming provides attributes common to elements which refer to named persons, places, organizations etc.
    keyprovides an external means of locating a full definition for the entity (or entities) being named, such as a database record key or other token.
Its main use is as a means of gathering together all references to the same individual or location scattered throughout a document:
<p>
 <q>My dear <rs key="BENM1type="person"> Mr. Bennet</rs>,</q>
said <rs key="BENM2type="person">his lady</rs> to him one day,
<q>have you heard that <rs key="NETP1type="place">Netherfield
     Park</rs> is let at last?</q>
</p>
<p>
 <name key="VOM1type="person">Mme. de Volanges</name> marie <rs key="VOM2">sa fille</rs>: c'est encore un secret;
mais elle m'en a fait part hier.
</p>

The value of the key attribute may be an unexpanded code, as in the examples above, with no significance. More usually however, it will reference some other resource providing more information about the entity named by the element, which might itself include a number of alternative forms of name for the entity in question.

This use should be distinguished from the use of a nested <reg> (regularization) element to provide the standard form of a referring string, as in this example:
<p>My personal life during
the administration of <rs key="POJA1type="person">Col. Polk
   (<reg>Polk, James K.</reg>)</rs> has but poorly compensated me for the
suspended enjoyments and pursuits of private and professional
spheres</p>
The <choice> element discussed in 3.4 Simple Editorial Changes may be used if it is desired to record both a normalized form of a name and the name used in the source being encoded:
<p>
 <name key="WADLM1type="person">
  <choice>
   <orig>Walter de la Mare</orig>
   <reg>de la Mare, Walter</reg>
  </choice>
 </name>
was born at <name key="Ch1type="place">Charlton</name>, in
<name key="KT1type="county">Kent</name>, in 1873.
</p>
The <index> element discussed in 3.8.2 Index Entries may be more appropriate if the function of the regularization is to provide a consistent index:
<p>
 <name type="place">Montaillou</name> is not a large parish.
At the time of the events which led to
<name type="person">Fournier<index>
   <term>Benedict XII, Pope of Avignon (Jacques Fournier)</term>
  </index>
 </name>'s
investigations, the local population consisted of between 200 and 250 inhabitants.
</p>
Although adequate for many simple applications, these methods have two inconveniences: if the name occurs many times, then its regularised form must be repeated many times; and the burden of additional XML markup in the body of the text may be inconvenient to maintain and complex to process. For applications such as onomastics, relating to persons or places named rather than the name itself, or wherever a detailed analysis of the component parts of a name is needed, the specialized elements described in chapter 13 Names, Dates, People, and Places or the analytical tools described in chapter 18 Feature Structures should be used.

3.5.2 Addresses

These Guidelines propose the following elements to distinguish postal and electronic addresses:
  • address contains a postal address, for example of a publisher, an organization, or an individual.
  • email (electronic mail address) contains an e-mail address identifying a location to which e-mail messages can be delivered.
These two elements constitute the class of model.addressLike elements; for other kinds of address this class may be extended by adding new elements if necessary.
These Guidelines provide no particular means for encoding the substructure of an email address (for example, distinguishing the local part from the domain part), nor of distinguishing personal email addresses from generic or fictitious ones.
<email>editors@tei-c.org</email>
The simplest way of encoding a postal address is to regard it as a series of distinct lines, just as they might be written on an envelope. The following element supports this view:
  • addrLine (address line) contains one line of a postal address.
Here is an example of a postal address encoded using this approach:
<address>
 <addrLine>110 Southmoor Road,</addrLine>
 <addrLine>Oxford OX2 6RB,</addrLine>
 <addrLine>UK</addrLine>
</address>
Alternatively, an address may be encoded as a structure of more semantically rich elements. The class model.addrPart element class identifies a number of such possible components:
  • street a full street address including any name or number identifying a building as well as the name of the street or route on which it is located.
  • name (name, proper noun) contains a proper noun or noun phrase.
    typeindicates the type of the object which is being named by the phrase.
  • postCode (postal code) contains a numerical or alphanumeric code used as part of a postal address to simplify sorting or delivery of mail.
  • postBox (postal box or post office box) contains a number or other identifier for some postal delivery point other than a street address.
  • model.nameLike groups elements which name or refer to a person, place, or organization.
  • model.persNamePart groups elements which form part of a personal name.
  • model.placeNamePart groups elements which form part of a place name.
Any number of elements from the model.addrPart class may appear within an address and in any order. None of them is required.

Where code letters are commonly used in addresses (for example, to identify regions or countries) a useful practice is to supply the full name of the region or country as the content of the element, but to supply the abbreviatory code as the value of the global n attribute, so that (for example) an application preparing formatted labels can readily find the required information. Other components of addresses may be represented using the general-purpose <name> element or (when the additional module for names and dates is included) the more specialized elements provided for that purpose.

Using just the elements defined by the core module, the above address could thus be represented as follows:
<address>
 <street>110 Southmoor Road</street>
 <name type="city">Oxford</name>
 <postCode>OX2 6RB</postCode>
 <name type="country">United Kingdom</name>
</address>
The order of elements within an address is highly culture-specific, and is therefore unconstrained:
<address>
 <name type="org">Università di Bologna</name>
 <name type="country">Italy</name>
 <postCode>40126</postCode>
 <name type="city">Bologna</name>
 <street>via Marsala 24</street>
</address>

For further discussion of ways of regularizing the names of places, see section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses. A full postal address may also include the name of the addressee, tagged as above using the general purpose <name> element.

When a schema includes the names and dates module discussed in chapter 13 Names, Dates, People, and Places, a large number of more specific elements such as <country> or <settlement> will be available from the class model.addrPart. The above example might then be encoded as follows:
<address>
 <street>110 Southmoor Road</street>
 <settlement>Oxford</settlement>
 <postCode>OX2 6RB</postCode>
 <country>United Kingdom</country>
</address>

3.5.3 Numbers and Measures

This section describes elements provided for the simple encoding of numbers and measurements and gives some indication of circumstances in which this may usefully be done. The following phrase level elements are provided for this purpose:
  • num (number) contains a number, written in any form.
    typeindicates the type of numeric value.
    valuesupplies the value of the number in standard form.
  • measure contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name.
    typespecifies the type of measurement in any convenient typology.
  • measureGrp (measure group) contains a group of dimensional specifications which relate to the same object, for example the height and width of a manuscript page.

Like names or abbreviations, numbers can occur virtually anywhere in a text. Numbers are special in that they can be written with either letters or digits (twenty-one, xxi, and 21) and their presentation is language-dependent (e.g. English 5th becomes Greek 5.; English 123,456.78 equals French 123.456,78).

For many kinds of application, e.g. natural-language processing or machine translation, numbers are not regarded as ‘lexical’ in the same way as other parts of a text. For these and other applications, the <num> element provides a convenient method of distinguishing numbers from the surrounding text. For other kinds of application, numbers are only useful if normalized: here the <num> element is useful precisely because it provides a standardized way of representing a numerical value.

For example:
<num value="33">xxxiii</num>
<num type="cardinalvalue="21">twenty-one</num>
<num type="percentagevalue="10">ten percent</num>
<num type="percentagevalue="10">10%</num>
<num type="ordinalvalue="5">5th</num>
<num type="fractionvalue="0.5">one half</num>
<num type="fractionvalue="0.5">1/2</num>

In its fullest form, a measure consists of a number, a phrase expressing units of measure and a phrase expressing the commodity being measured, though not all of these components need be present in every case. It may be helpful to distinguish measures from surrounding text for two reasons. Firstly, a measure may be expressed using a particular notation or system of abbreviations which the encoder does not wish to regard as lexical. Secondly, a quantitative application may wish to distinguish and normalize the internal components of a measure, in order to perform calculations on them.

Consider, as an example of the first case, the following list of Celia's charms, in which the encoder has chosen to make explicit the measurements:
<div n="2">
 <list type="gloss">
  <label>Age</label>
  <item>Unimportant</item>
  <label>Head</label>
  <item>Small and round</item>
  <label>Eyes</label>
  <item>Green</item>
  <label>Complexion</label>
  <item>White</item>
  <label>Hair</label>
  <item>yellow</item>
  <label>Features</label>
  <item>Mobile</item>
  <label>Neck</label>
  <item>
   <measure>13¾"</measure>
  </item>
  <label>Upper arm</label>
  <item>
   <measure>11"</measure>
  </item>
<!--...-->
 </list>
<!-- ... -->
</div>
In the same way, it may be convenient to mark representations of currency which might otherwise be misinterpreted as lexical:
<p>...the sum of
<measure type="currency">12s 6d</measure>...</p>
In general, normalization of a measure will require specification of one or more of its three parts: the quantity, the units, and possibly also the commodity being measured. This is accomplished by supplying values for the three attributes quantity, unit, and commodity, which are supplied by the att.measurement class:
  • att.measurement provides attributes to represent a regularized or normalized measurement.
    quantityspecifies the number of the specified units that comprise the measurement
    unitindicates the units used for the measurement, usually using the standard symbol for the desired units.
    commodityindicates the substance that is being measured
With these attributes, the measurement of Celia's neck may be specified in a normalized form:
<measure quantity="13.75unit="in">13¾"</measure>
Such techniques are particularly useful when representing historical data such as inventories:
<list>
 <item>
  <measure
    type="volume"
    quantity="2"
    unit="bag"
    commodity="hops">
ii bags hops </measure>
 </item>
 <item>
  <measure
    type="volume"
    quantity="6"
    unit="truss"
    commodity="cloth">
six trusses Woolen and linen goods </measure>
 </item>
 <item>
  <measure
    type="weight"
    quantity="5"
    unit="ton"
    commodity="coal">
5 tonnes coale
  </measure>
 </item>
</list>
The <measureGrp> element is provided as a means of grouping several related measurements together, either because the measurement involves several dimensions (for example height and width) or to avoid the need to repeat all the normalizing attributes:
<measureGrp type="volumeunit="in">
 <measure type="heightquantity="14">xiv</measure>
 <measure type="widthquantity="5">v</measure>
 <measure type="depthquantity="10">x</measure>
</measureGrp>

3.5.4 Dates and Times

Dates and times, like numbers, can appear in widely varying culture- and language-dependent forms, and can pose similar problems in automatic language processing. Such elements constitute the model.dateLike class, of which the default members are:
  • date contains a date in any format.
    calendarindicates the system or calendar to which the date belongs.
  • time contains a phrase defining a time of day in any format.
These elements have some additional attributes by virtue of being members of the att.datable and att.duration classes which, in turn, are members of the att.datable.w3c and att.duration.w3c classes. In particular, the when attribute will be discussed here:
  • att.datable.w3c provides attributes for normalization of elements that contain datable events using the W3C datatypes.
    whensupplies the value of a date or time in a standard form.

Dates can occur virtually anywhere in a text, but in some contexts (e.g. bibliographic citations) their encoding is recommended or required rather than optional. Times can also appear anywhere but are generally optional.

Partial dates or times (e.g. 1990, September 1990, twelvish) can be expressed in the when attribute by simply omitting a part of the value supplied. Imprecise dates or times (for example early August, some time after ten and before twelve) may be expressed as date or time ranges.

Where the certainty (i.e. reliability) of the date or time itself is in question, rather than its precision, the encoder should record this fact using the mechanisms discussed in chapter 21 Certainty and Responsibility.

These mechanisms are useful primarily for fully specified dates or times known with certainty. If component parts of dates or times are to be marked up, or if a more complex analysis of the meaning of a temporal expression is required, the techniques described in chapter 13 Names, Dates, People, and Places should be used in preference to the simple method outlined here.

The when attribute is a useful way of normalizing or disambiguating dates and times which can appear in many formats, as the following examples show:
<date when="1980-02-12">12/2/1980</date>
Given on the <date when="1977-06-12">Twelfth Day of June
in the Year of Our Lord One Thousand Nine Hundred and
Seventy-seven of the Republic the Two Hundredth and first
and of the University the Eighty-Sixth.</date>
The when attribute always supplies a normalized representation of the date given as content of the <date> element. The format used should be a valid W3C schema datatype.12 Some typical examples follow:
<date when="2001">The
year 2001</date>
<date when="2001-09">September 2001</date>
<date when="2001-09-11">11 Sept 01</date>
<date when="--09-11">9/11</date>
<date when="--09">September</date>
<date when="---11">Eleventh of the month</date>
<time when="08:48:00">8:48</time>
<date when="2001-09-11T12:48:00">Sept 11th, 12 minutes before 9 am</date>
Note in the last example the use of a normalized representation for the date string which includes a time: this example could thus equally well be tagged using the <time> element.
The following examples demonstrate the use of the <date> element to mark a period of time:
<p>Those five years —
<date from="1918to="1923">1918 to 1923</date>
— had been, he suspected,
somehow very important.</p>
<p>The Eddic poems are preserved in a unique
manuscript (Codex Regius 2365) from <date notBefore="1250notAfter="1300">the second half of the thirteenth
   century</date>, and <title>Hervarar
   saga</title> dates from <date when="1300">around 1300</date>.</p>

The calendar attribute may be used to specify a date in any calendar system; if the when attribute is also supplied, it should specify the equivalent date in the Gregorian calendar.

3.5.5 Abbreviations and Their Expansions

It is sometimes desirable to mark abbreviations in the copy text, whether to trigger special processing for them, to provide the full form of the word or phrase abbreviated, or to allow for different possible expansions of the abbreviation. Abbreviations may be transcribed as they stand, or expanded; they may be left unmarked, or marked using these tags:
  • abbr (abbreviation) contains an abbreviation of any sort.
  • expan (expansion) contains the expansion of an abbreviation.
The <abbr> element is useful as a means of distinguishing semi-lexical items such as acronyms or jargon:
We can sum up the above discussion as follows: the identity of a
<abbr>CC</abbr> is defined by that calibration of values which
motivates the elements of its <abbr>GSP</abbr>; ...
Every manufacturer of <abbr>3GL</abbr> or <abbr>4GL</abbr>
languages is currently nailing on <abbr>OOP</abbr> extensions.
The type attribute may be used to distinguish types of abbreviation by their function:
<abbr type="title">Dr.</abbr>
<abbr type="initial">M.</abbr> Deegan is
the Director of the <abbr type="acronym">CTI</abbr> Centre for Textual Studies.
Abbreviations such as Dr. M. above may be treated as two abbreviations, as above, or as one:
<abbr>Dr. M.</abbr> Deegan is
the Director of the <abbr>CTI</abbr> Centre for Textual Studies.
The <expan> element may be used simply to record that an abbreviation has been silently expanded by the encoder, perhaps for reasons of house style or editorial policy. It should always include the whole of an abbreviated phrase or word. More usually however this will be combined with the <abbr> element inside a <choice> element to record both the abbreviation and its expansion:
the
<choice>
 <expan>World Wide Web Consortium</expan>
 <abbr>W3C</abbr>
</choice>
Nested abbreviations may also be handled in this way:
<choice>
 <abbr>RELAXNG</abbr>
 <expan>regular
   language for <choice>
   <abbr>XML</abbr>
   <expan>extensible markup
       language</expan>
  </choice>, next
   generation</expan>
</choice>

Abbreviation is a particularly important feature of manuscript and other source materials, the transcription of which needs more detailed treatment than is possible using these simple elements. A more detailed set of recommendations is discussed in 11.3 Altered, Corrected, and Erroneous Texts, which includes additional elements made available for the purpose by the transcr module.

3.6 Simple Links and Cross-References

Cross-references or links between one location in a document and one or more other locations, either in the same or different XML documents, may be encoded using the elements <ptr> and <ref>, as discussed in this section. These elements both ‘point’ from one location in a document, the place that the element itself appears, to another (or to several), specified by the target attribute. Linkages of several other kinds are also provided for in these guidelines; see further chapter 16 Linking, Segmentation, and Alignment.

The value of the target attribute, wherever it appears, provides a way of pointing to some other element using a method standardized by the W3C consortium, and known as the XPointer mechanism. This permits a range of complexity, from the very simple (a reference to the value of the target element's xml:id attribute) to the more complex usage of a full URI with embedded XPointers. For example, the source of the following paragraph looks something like this:
<p>For an introduction
to the use of links in general, see <ptr target="#SA"/>; for the
complete XPointer specification, see <ptr
   target="http://www.w3.org/TR/xptr-framework/"/>
,
<ptr target="http://www.w3.org/TR/xptr-element/"/>,
<ptr target="http://www.w3.org/TR/xptr-xmlns/"/>, and
<ptr
   target=" "/>
;
for a discussion of TEI schemes for XPointer, see
<ptr target="#SATS"/>.</p>
Alternatively, if no explicit link is to be encoded, but it is simply required to mark the phrase as a cross-reference, the <ref> element may be used without a target attribute.

For an introduction to the use of links in general, see 16 Linking, Segmentation, and Alignment; for the complete XPointer specification, see http://www.w3.org/TR/xptr-framework/, http://www.w3.org/TR/xptr-element/, http://www.w3.org/TR/xptr-xmlns/, and http://www.w3.org/TR/xptr-xpointer/#xpointer(id('chum')/quote); for a discussion of TEI schemes for XPointer, see 16.2.4 TEI XPointer Schemes.

  • ptr/ (pointer) defines a pointer to another location.
    targetspecifies the destination of the pointer by supplying one or more URI References
    cRef(canonical reference) specifies the destination of the pointer by supplying a canonical reference from a scheme defined in a <refsDecl> element in the TEI header
  • ref (reference) defines a reference to another location, possibly modified by additional text or comment.
    targetspecifies the destination of the reference by supplying one or more URI References
    cRef(canonical reference) specifies the destination of the reference by supplying a canonical reference from a scheme defined in a <refsDecl> element in the TEI header
The elements <ptr> and <ref> are the default members of the phrase-level model class model.ptrLike. As members of the class att.pointing, they also carry the following attributes:
  • att.pointing defines a set of attributes used by all elements which point to other elements by means of one or more URI references.
    typecategorizes the pointer in some respect, using any convenient set of categories.
    evaluatespecifies the intended meaning when the target of a pointer is itself a pointer.
The two elements may be used in the same way; the difference between them is simply that while the <ptr> element is empty, the <ref> element may contain phrases specifying, or describing more exactly, the target of a cross-reference, which form the content of the element. Since its content thus serves as a human-readable pointer, in the simplest case a <ref> element need not identify its target in any other way. For example:
See <ref>section 12 on page 34</ref>.
More usually, it will be desirable to identify the target of the cross-reference using the target attribute, so that processing software can access it directly, for example to implement a linkage, to generate an appropriate reference, or to give an error message if it cannot be found. Assuming that section 12 in the previous example has been tagged
<div1 xml:id="SEC12">
<!-- ... -->
</div1>
then the same cross-reference might more exactly be encoded as
See especially <ref target="#SEC12">section 12 on page 34</ref>.
If the text for the cross-reference is to be generated according to a fixed pattern, or if no text is to appear in the body of the cross-reference, the <ptr> element would be used as follows:
See in particular <ptr target="#SEC12"/>.
A cross-reference may point to any number of locations simultaneously, simply by giving more than one identifier as the value of its target attribute. This may be particularly useful where an analytic index is to be encoded, as in the following example:
<list>
 <item>Saints aid rejected in mel. <ptr target="#p299"/>
 </item>
 <item>Sallets censured <ptr target="#p143 #p144"/>
 </item>
 <item>Sanguine mel. signs <ptr target="#p263"/>
 </item>
 <item>Scilla or sea onyon, a purger of mel. <ptr target="#p442"/>
 </item>
</list>
Here the targets of the cross-references are simply page numbers; it is assumed that corresponding elements with identifiers p299, p143, etc. have been provided in the body of the text, for example as page breaks
<pb xml:id="p143"/>
...
<pb xml:id="p144"/>
...
<pb xml:id="p263"/>
...
<pb xml:id="p299"/>
...
<pb xml:id="p442"/>
...
The type attribute may be used, as elsewhere, to categorize the cross-reference according to any system of importance to the encoder. If bibliographic references require special processing (e.g. in order to provide a consistent short-form reference), they might be tagged thus:
Similar forms, often called
<term rend="ldquo rdquo">rewriting systems</term>, have a long history
among mathematicians, but the specific form of <ptr target="#fig22"/>
was first studied extensively by Chomsky <ptr type="bibliogtarget="#chom59"/>.

<!-- ... -->
<figure xml:id="fig22">
<!-- ... -->
</figure>
<!-- elsewhere, in the bibliography -->
<bibl xml:id="chom59">
<!-- citation for the book referenced above -->
</bibl>
The value bibliog for the type attribute on the second <ptr> element here might be used to indicate that the object being referenced here is a bibliographic entry rather than a simple cross-reference to an illustration, as is the first <ptr>. In either case, the value of the target attribute is a pointer to some other element.

The <ptr> and <ref> elements have many applications in addition to the simple cross-referencing facilities illustrated in this section. In conjunction with the analytic tools discussed in chapters 16 Linking, Segmentation, and Alignment, 17 Simple Analytic Mechanisms, and 18 Feature Structures, they may be used to link analyses of a text to their object, to combine corresponding segments of a text, or to align segments of a text with a temporal or other axis or with each other.

3.7 Lists

The following elements are provided for the encoding of lists, their constituent items, and the labels or headings associated with them:
  • list contains any sequence of items organized as a list.
  • item contains one component of a list.
  • label contains the label associated with an item in a list; in glossaries, marks the term being defined.
  • head (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc.
  • headLabel (heading for list labels) contains the heading for the label or term column in a glossary list or similar structured list.
  • headItem (heading for list items) contains the heading for the item or gloss column in a glossary list or similar structured list.

The <list> element should be used to mark any kind of list: numbered, lettered, bulleted, or unmarked. Lists formatted as such in the copy text should in general be encoded using this element, with an appropriate value for the type attribute. Lists given as run-on text may also be encoded using this element, where this is felt to be appropriate.

Each distinct item in the list should be encoded as a distinct <item> element. If the numbering or other identification for the items in a list is unremarkable and may be reconstructed by any processing program, no enumerator need be specified. If however an enumerator is retained in the encoded text, it may be supplied either by using the n attribute on the <item> element, or by using a <label> element. The following examples are thus equivalent:
I will add two facts, which have seldom occurred in
the composition of six, or at least of five quartos.
<list rend="runontype="ordered">
 <label>(1)</label>
 <item>My first rough manuscript, without any
   intermediate copy, has been sent to the press.</item>
 <label>(2)</label>
 <item>Not a sheet has been seen by any human
   eyes, excepting those of the author and the printer:
   the faults and the merits are exclusively my own.</item>
</list>
I will add two facts, which have seldom occurred in
the composition of six, or at least of five quartos.
<list rend="runontype="ordered">
 <item n="1">My first rough manuscript, without any
   intermediate copy, has been sent to the press.</item>
 <item n="2">Not a sheet has been seen by any human
   eyes, excepting those of the author and the printer:
   the faults and the merits are exclusively my own.</item>
</list>
The two styles may not be mixed in the same list: if one item is preceded by a label, all must be.
A list need not necessarily be displayed in list format. For example, the following is a reasonable encoding of a list which (in the original) is simply printed as a single paragraph:
On those remote pages it is written that animals are
divided into <list>
 <item n="a">those that belong to the Emperor, </item>
 <item n="b">embalmed ones, </item>
 <item n="c">those that are trained, </item>
 <item n="d">suckling pigs, </item>
 <item n="e">mermaids, </item>
 <item n="f">fabulous ones, </item>
 <item n="g">stray dogs, </item>
 <item n="h">those that are included in this classification, </item>
 <item n="i">those that tremble as if they were mad, </item>
 <item n="j">innumerable ones, </item>
 <item n="k">those drawn with a very fine camel's-hair brush, </item>
 <item n="l">others, </item>
 <item n="m">those that have just broken a flower vase, </item>
 <item n="n">those that resemble flies from a distance. </item>
</list>
A list may be given a heading or title, for which the <head> element should be used, as in the next example, which also demonstrates simple use of the <label> element to mark a tabular or glossary list in which each item is associated with a word or phrase rather than a numeric or alphabetic enumerator:
<list type="gloss">
 <head>Report of the conduct and progress of Ernest Pontifex.
   Upper Vth form — half term ending Midsummer 1851</head>
 <label>Classics</label>
 <item>Idle listless and unimproving</item>
 <label>Mathematics</label>
 <item>ditto</item>
 <label>Divinity</label>
 <item>ditto</item>
 <label>Conduct in house</label>
 <item>Orderly</item>
 <label>General conduct</label>
 <item>Not satisfactory, on account of his great
   unpunctuality and inattention to duties</item>
</list>
In such a list, the individual items have internal structure. In complex cases, where list items contain many components, the list is better treated as a table, on which see chapter 14 Tables, Formulæ, and Graphics. A particularly important instance of the simple two-column table is the ‘glossary list’, which should be marked by the tag list type="gloss". In such lists, each <label> element contains a term and each <item> its gloss; it is a semantic error for a list tagged with type="gloss" not to have labels. For example:
<list type="gloss">
 <head>Unit Three — Vocabulary</head>
 <label xml:lang="la">acerbus, -a, -um </label>
 <item>bitter, harsh</item>
 <label xml:lang="la">ager, agrī, M. </label>
 <item>field</item>
 <label xml:lang="la">audiō, īre,
   īvī, ītus </label>
 <item>hear, listen (to)</item>
 <label xml:lang="la">bellum, -ī, N. </label>
 <item>war</item>
 <label xml:lang="la">bonus, -a, -um </label>
 <item>good</item>
</list>
Additionally, the <term> and <gloss> elements discussed in section 3.3.4 Terms, Glosses, Equivalents, and Descriptions might be used to make explicit the role that each column in the glossary list has, as follows:
<list type="gloss">
 <head>Unit Three — Vocabulary</head>
 <label>
  <term xml:lang="la">acerbus, -a, -um</term>
 </label>
 <item>
  <gloss>bitter, harsh</gloss>
 </item>
 <label>
  <term xml:lang="la">ager, agrī, M. </term>
 </label>
 <item>
  <gloss>field</gloss>
 </item>
 <label>
  <term xml:lang="la">audiō, -īre, -īvī, -ītus</term>
 </label>
 <item>
  <gloss>hear, listen (to)</gloss>
 </item>
 <label>
  <term xml:lang="la">bellum, -ī, N. </term>
 </label>
 <item>
  <gloss>war</gloss>
 </item>
 <label>
  <term xml:lang="la">bonus, -a, -um</term>
 </label>
 <item>
  <gloss>good</gloss>
 </item>
</list>
Note in the above examples the use of the global xml:lang attribute to specify on the <label> (or <term>) element what language the term is from. For further discussion of the xml:lang attribute see section 1.3.1.1 Global Attributes, and section vi.i Language identification. A more elaborate markup for this glossary would distinguish the headword forms from the grammatical information (principal parts and gender), perhaps using elements taken from 9 Dictionaries.
In addition to the <head> element used to supply a title or heading for the whole list, headings for the two columns of a glossary-style list may be specified using the two special elements <headLabel> and <headItem>:
The simple, straightforward statement of an idea is
preferable to the use of a worn-out expression.
<list type="gloss">
 <headLabel>TRITE</headLabel>
 <headItem>SIMPLE, STRAIGHTFORWARD</headItem>
 <label>bury the hatchet </label>
 <item>stop fighting, make peace</item>
 <label>at loose ends </label>
 <item>disorganized</item>
 <label>on speaking terms </label>
 <item>friendly</item>
 <label>fair and square </label>
 <item>completely honest</item>
 <label>at death's door </label>
 <item>near death</item>
</list>
The elements <label>, <head>, <headLabel>, and <headItem> may contain only phrase-level elements. The <item> element however may contain paragraphs or other ‘chunks’, including other lists. In this example, a glossary list contains two items, each of which is itself a simple list:
<list type="gloss">
 <label>EVIL</label>
 <item>
  <list type="simple">
   <item>I am cast upon a horrible desolate island, void
       of all hope of recovery.</item>
   <item>I am singled out and separated as it were from
       all the world to be miserable.</item>
   <item>I am divided from mankind — a solitaire; one
       banished from human society.</item>
  </list>
 </item>
 <label>GOOD</label>
 <item>
  <list type="simple">
   <item>But I am alive; and not drowned, as all my
       ship's company were.</item>
   <item>But I am singled out, too, from all the ship's
       crew, to be spared from death...</item>
   <item>But I am not starved, and perishing on a barren place,
       affording no sustenances....</item>
  </list>
 </item>
</list>

Lists of different types may be nested to arbitrary depths in this way.

3.8 Notes, Annotation, and Indexing

3.8.1 Notes and Simple Annotation

The following elements are provided for the encoding of discursive notes, either already present in the copy text or supplied by the encoder:
  • note contains a note or annotation.

A note is any additional comment found in a text, marked in some way as being out of the main textual stream. All notes should be marked using the same tag, <note>, whether they appear as block notes in the main text area, at the foot of the page, at the end of the chapter or volume, in the margin, or in some other place.

Notes may be in a different hand or typeface, may be authorial or editorial, and may have been added later. Attributes may be used to specify these and other characteristics of notes, as detailed below.

Where possible, the body of a note should be inserted in the text at the point at which its identifier or mark first appears. This may not be possible for example with marginal notes, which may not be anchored to an exact location. For simplicity, it may be adequate to position marginal notes before the relevant paragraph or other element. In some cases, however, it may be desirable to transcribe notes not at their point of attachment to the text but at their point of appearance (at the end of the volume, or the end of the chapter — not, in general, when the notes appear at the foot of the page); in this case the target attribute should be used to specify the point of attachment. In some cases, the note is explicitly attached not to a point but to a span of text; in which case both the target and targetEnd attributes should be used to specify the span of attachment. For a full discussion of pointing to points and spans in the text, see section 3.6 Simple Links and Cross-References.

Examples:
<l>The self-same moment I could pray</l>
<l>And from my neck so free</l>
<l>The albatross fell off, and sank</l>
<l>Like lead into the sea.
<note type="authplace="margin">The spell begins to break</note>
</l>
Collections are ensembles of distinct entities or objects
of any sort.<note n="1place="foot">We explain below why we use
the uncommon term <mentioned>collection</mentioned>
instead of the expected <mentioned>set</mentioned>.
Our usage corresponds to the <mentioned>aggregate</mentioned> of many
mathematical writings and to the sense of <mentioned>class</mentioned>
found in older logical writings.</note> The elements ...

In addition to transcribing notes from the copy text, researchers may wish to annotate the electronic text itself, by attaching analytic notes in some structured vocabulary to particular passages of text, e.g. to specify the topics or themes of a text. The <span> and <interp> elements may be used for such applications; these elements are available when the module for simple analysis is selected (see section 17.3 Spans and Interpretations).

3.8.2 Index Entries

The indexing of scholarly texts is a skilled activity, involving substantial amounts of human judgment and analysis. It should not therefore be assumed that simple searching and information retrieval software will be able to meet all the needs addressed by a well-crafted manual index, although it may complement them for example by providing free text search. The role of an index is to provide access via keywords and phrases which are not necessarily present in the text itself, but must be added by the skill of the indexer.

3.8.2.1 Pre-existing indexes
When encoding a pre-existing text, therefore, if such an index is present it may be advisable to retain it along with the text, rather than attempt to regenerate it automatically. Elements discussed elsewhere in these Guidelines may be used for this purpose. For example, the <div1> element or <div> element may be used to mark the section of the text containing the index and the <list> element might be used to mark the index itself, each entry being represented by an <item> element, possibly containing within it a series of <ptr> or <ref> elements, as follows:
<div type="index">
<!--...-->
 <list type="index">
  <item>Women, how cause of mel. <ref>193</ref>; their vanity in
     apparell taxed, <ref>527</ref>; their counterfeit tears
  <ref>547</ref>; their vices <ref>601</ref>, commended,
  <ref>624</ref>.</item>
  <item>Wormwood, good against mel. <ref>443</ref>
  </item>
  <item>World taxed, <ref>181</ref>
  </item>
  <item>Writers of the cure of mel. 295</item>
<!--...-->
 </list>
</div>
Note that this simple representation does not capture the nested structure of the first of these index entries. A more accurate representation might entail the use of nested lists like the following:
<item>Women,
<list>
  <item>how cause of mel. <ref>193</ref>;</item>
  <item>their vanity in apparell taxed, <ref>527</ref>;</item>
  <item>their counterfeit tears <ref>547</ref>;</item>
  <item>their vices
  <list>
    <item>
     <ref>601</ref>,</item>
    <item> commended, <ref>624</ref>.</item>
   </list>
  </item>
 </list>
</item>
The page references, encoded simply as <ref> elements above, might also include direct links to the appropriate location in the encoded text, using (for example) a target attribute to supply the identifier of an associated page break element:

<!-- in the text --><pb xml:id="P624"/>
<!-- start of page 624 -->
<!-- in the index -->
<ref target="#P624">624</ref>
For further discussion of this and alternative ways of encoding such links see the discussion in section 16 Linking, Segmentation, and Alignment. Note that similar methods may also be used to encode a table of contents, as further exemplified in section 4.5 Front Matter.
3.8.2.2 Auto-generated indexes

It can also be useful, however, to generate a new index from a machine-readable text, whether the text is being written for the first time with the tags here defined, or as an addition to a text transcribed from some other source. Depending on the complexity of the text and its subject matter, such an automatically-generated index may not in itself satisfy all the needs of scholarly users. However it can assist a professional indexer to construct a fully adequate index, which might then be post-edited into the digital text, marked-up along the lines already suggested for preserving pre-existing index material.

Indexes generally contain both references to specific pages or sections and references to page ranges or sequences. The same element is used in either case:
  • index (index entry) marks a location to be indexed for whatever purpose.
Like the <interp> element described in 17.3 Spans and Interpretations this element may be used simply to provide descriptive or interpretive label of some kind for any location within a text, to be processed in any way by analytic software, but its main purpose is to facilitate the generation of an index for a printed version of the text. An <index> element may be placed anywhere within a text, between or within other elements. The headwords to be used when making up this index are given by the <term> elements within the <index> element. The location of the generated index might be specified by means of a processing instruction within the text, such as the following (the exact form of the PI is of course dependent on the application software in use):
<?tei indexplacement ?>
Alternatively, the special purpose <divGen> element might be used.
In the simplest case, a single headword is supplied by an <term> elements contained by an <index> element:
<p>The students understand procedures for Arabic lemmatisation
<index>
  <term>Lemmatization, Arabic</term>
 </index>and are beginning to build parsers.</p>

The effect of this will be to generate an index entry for the term ‘Lemmatization’, referencing the location of the original <index> element.

If the subject of Arabic lemmatization is treated at length in a text, then the index entry generated may need to reference a sequence of locations (e.g. page numbers). In such a case it will be necessary to identify the end of the relevant span of text as well as its starting point. This is most conveniently done by supplying an empty <anchor> element (as discussed in chapter 16 Linking, Segmentation, and Alignment) at the appropriate point and pointing to it from the <index> element by means of its spanTo attribute, as in this example:
<p>We now turn to the
topic of Arabic lemmatisation
<index spanTo="#ALAMEND">
  <term>Lemmatization, Arabic</term>
 </index> concerning which it is important to note .....

<!-- much learned material omitted here -->
and now we can build our parser.<anchor xml:id="ALAMEND"/>
</p>

This would generate the same index entries as the previous example, but the reference would be to the whole span of text between the location of the <index> element and the location of the element identified by the code ALAMEND, rather than a single point, and thus might (for example) include a sequence of page numbers.

Although the position of the <index> element in the text provides the target location that will be specified in the generated index entry, no part of the text itself is used to construct that entry. Index terms appearing in the entry come solely from the content of <term> elements, which consequently may have to repeat words or phrases from the text proper. This need not be done verbatim, thus giving scope for normalization of spelling (as in the example above) or other modifications which may assist generation of an index in a desired form or sequence.

Sometimes, for example when index terms are taken from a different language or consist of mathematical formulae or other expressions, even a normalized form of an index term may be insufficent for an application to order it exactly as desired. The sortKey attribute may be used to address this problem, as in the following example:
<p>The @ operator
<index>
  <term sortKey="0000">@</term>
 </index> precedes an
attribute name</p>
Here, an entry for the symbol @ will appear in the index, but will be sorted alphabetically as if it were the string 0000. This technique is also useful when an index entry is to contain some non-Unicode character or glyph represented by the <g> element discussed in chapter 5 Representation of Non-standard Characters and Glyphs. In the following example, we assume that somewhere a definition for this glyph has been provided using the elements described in chapter 5 Representation of Non-standard Characters and Glyphs, and given the code PrinceGlyph:
<char xml:id="PrinceGlyph">
<!-- definition of the glyph here -->
</char>
<p>The Artist formerly known as Prince <index>
  <term sortKey="Prince">
   <g ref="#PrinceGlyph"/>
  </term>
 </index>...</p>
Note that if no value is supplied for the sortKey attribute, a sorting application should always use the content of the <term> element as a sort key.
It is common practice to compile more than one index for a given text. A biography of a poet, for example, may offer an index of references to poems by the subject of the study, another index of works by other writers, an index of places or historical personages etc. The indexName attribute is used to assigning index terms and locations to one or more specific indexes:
<p>Sir John Ashford
<index indexName="INDEX-PERSONS">
  <term>Ashford, John</term>
 </index> was,
coincidentally, born in
<index indexName="INDEX-PLACES">
  <term>Ashford
     (Kent)</term>
 </index>Ashford...</p>
Multi-level indexing is particularly common in scholarly documents. For example, as well as entries such as TEI, or markup, an index may contain structured entries like TEI, markup practices, index terms, where a top level entry TEI is followed by a number of second-level subcategories, any or all of which may have a third-level list attached to them and so on. In order to reflect such a hierarchical index listing, <index> elements may be nested to the required depth. For example, suppose that we wish to make a structured index entry for ‘lemmatisation’ with subentries for ‘Arabic’, ‘Sanskrit’, etc. The example at the start of this section might then be encoded with nested <index> elements:
<p>The students understand procedures for Arabic lemmatisation
<index>
  <term>lemmatization</term>
  <index>
   <term>arabic</term>
  </index>
 </index>
...</p>
The index entry from Burton's Anatomy of Melancholy quoted above might be generated in a similar way. To generate such an entry, the body of the text might include, at page 193, an <index> element such as
<index>
 <term>Women</term>
 <index>
  <term>how cause of mel.</term>
 </index>
</index>
. Similary, page 601 of the body text would include an <index> element like the following:
<index>
 <term>Women</term>
 <index>
  <term>their vices</term>
 </index>
</index>
while the <index> element at page 624 would have a structure like the following:
<index>
 <term>Women</term>
 <index>
  <term>their vices</term>
  <index>
   <term>commended</term>
  </index>
 </index>
</index>

When processing such <index> elements, the duplication required to make the structure explicit will normally be removed, so as to produce entries like those quoted above. However, this is not required by the encoding recommended here.

As noted above, either a processing instruction or a <divGen> element may be used to mark the place at which an index generated from <index> elements should be inserted into the output of a processing program; typically but not necessarily this will be at some point within the back matter of the document. If the <divGen> element is used, then the type attribute should be used to specify which kind of index is to be generated, and its value should correspond with that of the indexName attribute on the relevant <index> elements.
<back>
 <div type="appendix">
  <head>Bibliography</head>
  <listBibl>
   <bibl> ... </bibl>
  </listBibl>
 </div>
 <divGen n="Index Nominumtype="INDEX-NAMES"/>
 <divGen n="Index Locitype="INDEX-PLACES"/>
</back>
As this example shows, the global n attribute may also be used to specify a name or identifier for the generated index itself in the usual way. Any additional headings etc. required for the generated index must be specified as content of the <divGen> element.
<back>
 <divGen n="A1type="INDEX-NAMES">
  <head>An Index of Names</head>
 </divGen>
</back>

If a processing instruction is used, then these parameters for the generated index may be supplied in some other way.

One final feature frequently found in manually-created indexes to printed works cannot readily be encoded by the means provided here, namely cross-references internal to the index term listing. For example, if all references to the TEI in a text have been indexed using the index term Text Encoding Initiative, it may also be helpful to include an entry under the term TEI containing some text such as ‘see Text Encoding Initiative’. Such internal cross-references must be added as part of the post-editing phase for an auto-generated index.

3.9 Graphics and other non-textual components

Graphics, such as illustrations or diagrams, appear in many different kinds of text, and often with different purposes. In some cases, the graphic is an integral part of a text (indeed, some texts — comic books for example — may be almost entirely graphic); in others the graphic may be a kind of optional extra. In some cases, the text may be incomprehensible unless the graphic is included; in others, the presence of the graphic adds very little to the sense of the work. It will therefore be a matter of encoding policy as to whether or how a graphic found in a source text is transferred to a digital version of the same. In documents which are ‘born digital’, graphics and other forms of non-textual element may be particularly salient, but their inclusion in an archival form of the document concerned remains an editorial decision.

Considered as structural components, graphics may be anchored to a particular point in the text, or they may float either completely freely, or within some defined scope, such as a chapter or section. Graphics of this kind often contain associated text such as a heading or label, and may also nest hierarchically. These Guidelines recommend the following different elements for these two cases:
  • figure groups elements representing or containing graphic information such as an illustration or figure.
  • graphic/ indicates the location of an inline graphic, illustration, or figure.
  • binaryObject provides encoded binary data representing an inline graphic or other object.
Graphic components may be encoded in a number of different ways:
  • in some non-XML or binary format such as PNG, JPEG, etc.
  • in an XML format such as SVG
  • in a TEI XML format such as the notation for graphs and trees described in 19 Graphs, Networks, and Trees
In the last two cases, the presence of the graphic will be indicated by an appropriate XML element, drawn from the SVG namespace in the second case, and its content will fully define the graphic to be produced. In the first case, the element <graphic> is used to mark the presence of the graphic only and the visual content is stored outside the XML document, and its location is referenced by means of an url attribute. Alternatively, if the graphical information is embedded directly within the document using some suitable binary format such as Base64, the <binaryObject> element may be used to contain it.
The elements <graphic> and <binaryObject> are made available as members of the class model.graphicLike when this module is included in a schema. These elements are also both members of the class att.internetMedia, from which they inherit the following attribute:
  • att.internetMedia provides attributes for specifying the type of a computer resource using a standard taxonomy.
    mimeType (MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type
For example, the following passage indicates that a copy of the image found in the source text may be recovered from the URL zigzag2.png and that this image is in PNG format:
<p>These were the four lines I moved in
through my first, second, third, and
fourth volumes. -- In the fifth volume
I have been very good, -- the precise
line I have described in it being this :
<graphic url="zigzag2.pngmimeType="image/png"/>
By which it appears, that except at the
curve, marked A. where I took a trip
to Navarre, -- and the indented curve B.
which is the short airing when I was
there with the Lady Baussiere and her
page, -- I have not taken the least frisk
...</p>
The <graphic> and <binaryObject> elements are phrase level elements which may be used anywhere that textual content is permitted, within but not between paragraphs or headings. In the following example, the encoder has decided to treat a specific printer's ornament as a heading:
<head>
 <graphic
   url=" "/>

</head>
.

The <figure> element discussed in 14.3 Specific Elements for Graphic Images provides additional capabilities, for example the ability to combine a number of images into a hierarchically organized structure or a block of images. It also provides the ability to associate an image with additional information such as a heading or a description.

3.10 Reference Systems

By reference system we mean the system by which names or references are associated with particular passages of a text (e.g. Ps. 23:3 for the third verse of Psalm 23 or Amores 2.10.7 for Ovid's Amores, book 2, poem 10, line 7). Such names make it possible to mark a place within a text and enable other readers to find it again. A reference system may be based on structural units (chapters, paragraphs, sentences; stanza and verse), typographic units (page and line numbers), or divisions created specifically for reference purposes (chapter and verse in Biblical texts). Where one exists, the traditional reference system for a text should be preserved in an electronic transcript of it, if only to make it easier to compare electronic and non-electronic versions of the text.

Reference systems may be recorded in TEI-encoded texts in any of the following ways:
  • where a reference system exists, and is based on the same logical structure as that of the text's markup, the reference for a passage may be recorded as the value of the global xml:id or n attribute on an appropriate tag, or may be constructed by combining attribute values from several levels of tags, as described below in section 3.10.1 Using the xml:id and n Attributes.
  • where there is no pre-existing reference system, the global xml:id or n attributes may be used to construct one (e.g. collections and corpora created in electronic form), as described below in section 3.10.2 Creating New Reference Systems.
  • where a reference system exists which is not based on the same logical structure as that of the text's markup (for example, one based on the page and line numbers of particular editions of the text rather than on the structural divisions of it), any of a variety of methods for encoding the logical structure representing the reference system may be employed, as described in chapter 20 Non-hierarchical Structures.
  • where a reference system exists which does not correspond to any particular logical structure, or where the logical structure concerned is of no interest to the encoder except as a means of supporting the referencing system, then references may be encoded by means of <milestone> elements, which simply mark points in the text at which values in the reference system change, as described below in section 3.10.3 Milestone Elements.
The specific method used to record traditional or new reference systems for a text should be declared in the TEI header, as further described in section 3.10.4 Declaring Reference Systems and in section 16.2.5 Canonical References.

When a text has no pre-existing associated reference system of any kind, these Guidelines recommend as a minimum that at least the page boundaries of the source text be marked using one of the methods outlined in this section. Retaining page breaks in the markup is also recommended for texts which have a detailed reference system of their own. Line breaks in prose texts may be, but need not be, tagged.13

3.10.1 Using the xml:id and n Attributes

When traditional reference schemes represent a hierarchical structuring of the text which mirrors that of the marked-up document, the n attribute defined for all elements may be used to indicate the traditional identifier of the relevant structural units. The n attribute may also be used to record the numbering of sections or list items in the copy text if the copy-text numbering is important for some reason, for example because the numbers are out of sequence.

For example, a traditional reference to Ovid's Amores might be Amores 2.10.7—book 2, poem 10, line 7. Book, poem, and line are structural units of the work and will therefore be tagged in any case. (See chapter 6 Verse for a discussion of structural units in verse collections.) In such cases, it is convenient to record traditional reference numbers of the structural units using the n attribute. The relevant tags for our example would be:
<div1 n="Amorestype="volume">
 <div2 n="1type="book">
<!-- ... -->
 </div2>
 <div2 n="2type="book">
  <div3 n="1type="poem">
<!-- ... -->
  </div3>
  <div3 n="2type="poem">
<!-- ... -->
  </div3>
<!-- ... -->
  <div3 n="10type="poem">
   <l n="1"> ... </l>
   <l n="2"> ... </l>
<!-- ... -->
   <l n="7"> ... </l>
  </div3>
<!-- ... -->
 </div2>
<!-- ... -->
</div1>
One may also place the entire standard reference for each portion of the text into the appropriate value for the n attribute, though for obvious reasons this takes more space in the file:
<div1 n="Amorestype="volume">
 <div2 n="Amores 1type="book">
<!-- ... -->
 </div2>
 <div2 n="Amores 2type="book">
  <div3 n="Amores 2.1type="poem">
<!-- ... -->
  </div3>
<!-- ... -->
  <div3 n="Amores 2.10type="poem">
<!-- ... -->
   <l n="Amores 2.10.7"> ... </l>
<!-- ... -->
  </div3>
<!-- ... -->
 </div2>
<!-- ... -->
</div1>
If the names used by the traditional reference system can be formulated as identifiers, then the references can be given as values for the xml:id attribute; this requires that the reference be given without internal spaces, begin with a letter or underscore, and contain no characters other than letters, digits, hyphens, underscores, full stops, and the various combining and extender characters, as defined by the XML specification. Unlike values for the n attribute, values for the xml:id attribute must be unique throughout the document. Our example then looks like this:
<div1 n="Amorestype="volume">
 <div2 xml:id="am.1type="book">
<!-- ... -->
 </div2>
 <div2 xml:id="am.2type="book">
  <div3 xml:id="am.2.1type="poem">
<!-- ... -->
  </div3>
<!-- ... -->
  <div3 xml:id="am.2.10type="poem">
<!-- ... -->
   <l xml:id="am.2.10.7"> ... </l>
<!-- ... -->
  </div3>
<!-- ... -->
 </div2>
<!-- ... -->
</div1>

To document the usage and to allow automatic processing of these standard references, it is recommended that the TEI header be used to declare whether standard references are recorded in the n or xml:id attributes and which elements may carry standard references or portions of them. For examples of declarations for the reference systems just shown, see section 3.10.4 Declaring Reference Systems.

Using the n attribute one can specify only a single standard referencing system, a limitation not without problems, since some editions may define structural units differently and thus create alternative reference systems. For example, another edition of the Amores considers poem 10 a continuation of poem 9, and therefore would specify the same line as Amores 2.9.31. In order to record both of these reference systems one could employ any of a variety of methods discussed in chapter 20 Non-hierarchical Structures.

3.10.2 Creating New Reference Systems

If a text has no canonical reference system of its own, a reference system, if needed, may be derived from the structure of the electronic text, specifically from the markup of the text. As with any reference system intended for long-term use, it is important to see the reference as an established, unchanging point in the text. Should the text be revised or rearranged, the reference-system identifiers associated with any bit of text must stay with that bit of text, even if it means the reference numbers fall out of sequence. (A new reference system may always be created beside the old one if out-of-sequence numbers must be avoided.)

The global attributes n and xml:id may be used to assign reference identifiers to segments of the text. Identifiers specified by either attribute apply to the entire element for which they are given. ID attributes must be unique within a single document, and ID values must begin with a letter. No such restrictions are made on the values of n attributes.

A convenient method of mechanically generating unique values for xml:id or n attributes based on the structure of the document is to construct, for each element, a domain-style address comprising a series of components separated by full stops, with one component for each level of the document hierarchy. Two methods may be used. In the typed path form of identifier, each component in the identifier takes the form of an element identifier, a hyphen, and a number, for example p-2. The element name specifies what type of element is to be sought, and the number specifies which occurrence of that element type is to be selected. (The hyphen and number may be omitted if there is only one element of the given type.) In the untyped path form of identifier, each component consists of a number, indicating which element in the sequence of nodes at each level is to be selected.

Identifiers generated with these methods should use the <text> element as their starting point, rather than the <TEI> or <body> elements. The <TEI> element may be taken as a starting point only if identifiers need to be generated for the <teiHeader>, which is not usually the case; using the <body> element as a root would prevent assignment of identifiers for the front and back matter. The component corresponding to the root element can be omitted from identifiers, if no confusion will result. In collections and corpora, the component corresponding to the root may be replaced by the unique identifier assigned to the text or sample.

In the following example, each element within the <text> element has been given a typed-path identifier as its xml:id value, and an untyped-path identifier as its n value; the latter are prefixed with the string AB, which may be imagined to be the general identifier for this text.
<text xml:id="Text-1n="AB">
 <front xml:id="Frontn="AB.1">
  <div xml:id="Front.div-1n="AB.1.1">
   <p> ... </p>
  </div>
  <titlePage xml:id="Front.titlePagen="AB.1.2">
   <titlePart> ... </titlePart>
  </titlePage>
  <div xml:id="Front.div-2n="AB.1.3">
   <p> ... </p>
  </div>
 </front>
 <body xml:id="Bodyn="AB.2">
  <p xml:id="Body.p-1n="AB.2.1"> ... </p>
  <p xml:id="Body.p-2n="AB.2.2"> ... </p>
  <div xml:id="Body.div-1n="AB.2.3">
   <head xml:id="Body.div-1.headn="AB.2.3.1"> ... </head>
   <p xml:id="Body.div-1.p-1n="AB.2.3.2"> ... </p>
   <p xml:id="Body.div-1.p-2n="AB.2.3.3"> ... </p>
  </div>
  <div xml:id="Body.div-2n="AB.2.4">
   <head xml:id="Body.div-2.headn="AB.2.4.1"> ... </head>
   <p xml:id="Body.div-2.p-1n="AB.2.4.2"> ... </p>
   <p xml:id="Body.div-2.p-2n="AB.2.4.3"> ... </p>
  </div>
 </body>
</text>
The typed and untyped path methods are convenient, but are in no way required for anyone creating a reference system.

If the xml:id attribute is used to record the reference identifiers generated, each value should record the entire path. If the n attribute is used, each value may record either the entire path or only the subpath from the parent element. The attribute used, the elements which can bear standard reference identifiers, and the method for constructing standard reference identifiers, should all be declared in the header as described in section 2.3.5 The Reference System Declaration.

3.10.3 Milestone Elements

Where the desired reference system does not correspond to any particular structural hierarchy, or the document combines multiple structural hierarchies (as further discussed in 20 Non-hierarchical Structures), simpler though less expressive methods may be necessary. In such cases the simplest solution may be just to mark up changes in the reference system where they occur, by using one or more of the following milestone elements:
  • milestone/ marks a boundary point separating any kind of section of a text, as indicated by changes in a standard reference system, where the section is not represented by a structural element.
  • pb/ (page break) marks the boundary between one page of a text and the next in a standard reference system.
  • lb/ (line break) marks the start of a new (typographic) line in some edition or version of a text.
  • cb/ (column break) marks the boundary between one column of a text and the next in a standard reference system.

These elements simply mark the points in a text at which some category in a reference system changes. They have no content but subdivide the text into regions, rather in the same way as milestones mark points along a road, thus implicitly dividing it into segments. The elements <pb>, <cb>, and <lb> are specialised types of milestone, marking page, column, and line boundaries. The global n attribute is used in each case to provide a value for the particular unit associated with this milestone (for example, the page or line number). Since it is not structural, validation of a reference system based on <milestone>s cannot be checked by an XML parser, so it will be the responsibility of the encoder or the application software to ensure that they are given in the correct order.

Milestones are useful where a text has two competing structures. For example, many English novels were first published as serial works, individual parts of which do not always contain a whole number of chapters. An encoder might decide to represent the chapter-based structure using <div1> elements, with <milestone> elements to mark the points at which individual parts end; or the reverse. Thus, an encoding in which chapters are regarded as more important than parts might encode some work in which chapter three begins in part one and is concluded in part two as follows:
<text>
 <body>
  <milestone unit="partn="1"/>
  <div1 n="1type="chapter">
   <p>
<!-- ... -->
   </p>
  </div1>
  <div1 n="2type="chapter">
   <p>
<!-- ... -->
   </p>
  </div1>
  <div1 n="3type="chapter">
   <p>
<!-- ... -->
   </p>
   <milestone unit="partn="2"/>
   <p>
<!-- ... -->
   </p>
  </div1>
 </body>
</text>
An encoding of the same work in which parts are regarded as more important than chapters might begin as follows:
<text>
 <body>
  <div1 n="1type="part">
   <milestone unit="chaptern="1"/>
   <p>
<!-- ... -->
   </p>
   <milestone unit="chaptern="2"/>
   <p>
<!-- ... -->
   </p>
   <milestone unit="chaptern="3"/>
   <p>
<!-- ... -->
   </p>
  </div1>
  <div1 n="2type="part">
   <p>
<!-- ... -->
   </p>
   <milestone unit="chaptern="4"/>
   <p>
<!-- ... -->
   </p>
  </div1>
 </body>
</text>

Milestone tags also make it possible to record the reference systems used in a number of different editions of the same work. The reference system of any one edition can be recreated from a text in which all are marked by simply ignoring all elements that do not specify that edition on their ed attribute.

As a simple example, assuming that edition E1 of some collection of poems regards the first two poems as constituting the first book, while edition E2 regards the first poem as prefatory, a markup scheme like the following might be adopted:
<milestone ed="E1unit="work"/>
<milestone ed="E2unit="work"/>
<milestone ed="E1unit="book"/>
<milestone ed="E1unit="poem"/>
<milestone ed="E2unit="poem"/>
<milestone ed="E2unit="book"/>
<milestone ed="E1unit="poem"/>
<milestone ed="E2unit="poem"/>

In this case no n value is specified, since the numbers rise predictably and the application can keep a count from the start of the document, if desired.

The value of the n attribute may but need not include the identifiers used for any larger sections. That is, either of the following styles is legitimate:
<milestone ed="E1unit="workn="Amores"/>
<milestone ed="E1unit="bookn="1"/>
<milestone ed="E1unit="poemn="1"/>
<milestone ed="E1unit="poemn="2"/>
<milestone ed="E1unit="bookn="2"/>
or
<milestone ed="E1unit="workn="Amores"/>
<milestone ed="E1unit="bookn="1"/>
<milestone ed="E1unit="poemn="1.1"/>
<milestone ed="E1unit="poemn="1.2"/>
<milestone ed="E1unit="bookn="2"/>

When using <milestone> tags, line numbers may be supplied for every line or only periodically (every fifth, every tenth line). The latter may be simpler; the former is more reliable.

The style of numbering used in the values of n is unrestricted: for the example above, I.i, I.ii, and I.iii could have been used equally well if preferred. The special value unnumbered should be reserved for marking sections of text which fall outside the normal numbering system (e.g. chapter heads, poem numbers, titles, or speaker attributions in a verse drama).

By default, there are no constraints on the values supplied for the ed attribute. If it is felt appropriate to enforce such a restriction, the techniques described in 23.2 Personalization and Customization may be used, for example to specify that the attribute must specify one of a predefined set of values.

See below, section 3.10.4 Declaring Reference Systems, for examples of declarations for the reference systems just shown.

3.10.4 Declaring Reference Systems

Whatever kind of reference system is used in an electronic text, it is recommended that the TEI header contain a description of its construction in the <refsDecl> element described in section 2.3.5 The Reference System Declaration. As described there, the declaration may consist either of a formal declaration using the <cRefPattern> element or an informal description in prose. The former is recommended because unlike prose it can be processed by software.

The three examples given in section 3.10.1 Using the xml:id and n Attributes would be declared as follows. The first example encodes the standard references for Ovid's Amores one level at a time, using the n attribute on the <div1>, <div2>, <div3>, and <l> tags. The header for such an encoding should look something like this:
<teiHeader>
 <fileDesc>
<!-- ... -->
 </fileDesc>
 <encodingDesc>
  <refsDecl>
   <cRefPattern
     matchPattern="([^ ]+) ([0-9]+)\.([0-9]+)\.([0-9]+)"
     replacementPattern=" ">

    <p>A canonical reference is assembled with
    <list>
      <item>the name of the <label>work</label>: the
      <att>n</att> of a <gi>div1</gi>,</item>
      <item>a space,</item>
      <item>the number of the <label>book</label>: the
      <att>n</att> of a child <gi>div2</gi>,</item>
      <item>a full stop</item>
      <item>the number of the <label>poem</label>: the
      <att>n</att> of a child <gi>div3</gi>,</item>
      <item>the line number: the <att>n</att> value of a
             child <gi>l</gi>
      </item>
     </list>
    </p>
   </cRefPattern>
   <cRefPattern
     matchPattern="([^ ]+) ([0-9]+)\.([0-9]+)"
     replacementPattern="#xpath(//div1[@n='$1']/div2[@n='$2']/div3[@n='$3']">

    <p>Same as above, but without the last component (full
         stop followed by the <gi>l</gi>'s <att>n</att>.</p>
   </cRefPattern>
   <cRefPattern
     matchPattern="([^ ]+) ([0-9]+)"
     replacementPattern="#xpath(//div1[@n='$1']/div2[@n='$2']">

    <p>Same as above, but without the poem component (full
         stop followed by the <gi>div3</gi>'s <att>n</att>.</p>
   </cRefPattern>
  </refsDecl>
 </encodingDesc>
</teiHeader>
The second example encodes the same reference system, again using the n attribute on the <div1>, <div2>, <div3>, and <l> tags, but giving the reference string in full on each tag. If canonical references are made only to lines, the reference system could be declared as follows:
<refsDecl>
 <cRefPattern
   matchPattern="([^ ]+ [0-9]+\.[0-9]+\.[0-9]+)"
   replacementPattern="#xpath(//l[@n='$1')"/>

</refsDecl>
Since the entire regular expression is enclosed as a parenthetical subgroup, the entire canonical reference string is sought as the value of the n attribute on an <l> element.
In order to handle references to poems as well as to individual lines, the declaration for the reference system must be more complicated:
<refsDecl>
 <cRefPattern
   matchPattern="([^ ]+ [0-9]+\.[0-9]+\.[0-9]+)"
   replacementPattern="#xpath(//l[@n='$1')"/>

 <cRefPattern
   matchPattern="([^ ]+ [0-9]+\.[0-9]+)"
   replacementPattern="#xpath(//div2[@n='$1')"/>

</refsDecl>
This declaration indicates that the entire reference string must be sought as the value of the n attribute on a <div1>, <div2>, <div3>, or <l> element.
The third example encodes the same reference system, this time giving the entire reference string as the value of the xml:id attribute on the relevant tags. The reference system declaration for such an encoding could be:
<refsDecl>
 <cRefPattern matchPattern="(.*)replacementPattern="#$1"/>
</refsDecl>
although in general there seems to be little advantage in this case: it is no more difficult to use a standard relative URI reference as the value of target.
Reference systems recorded by means of milestone tags can also be declared; the following prose description could be used to declare the example given in section 3.10.3 Milestone Elements.
<refsDecl>
 <p>Standard references to work, book, poem, and line may be
   constructed from the milestone tags in the text.</p>
</refsDecl>
Or in this way, using a formal declaration for this reference scheme derived from edition E1.
<refsDecl>
 <refState ed="E1unit="workdelim=" "/>
 <refState ed="E1unit="bookdelim="."/>
 <refState ed="E1unit="poemdelim=":"/>
 <refState ed="E1unit="line"/>
</refsDecl>

3.11 Bibliographic Citations and References

Bibliographic references (that is, full descriptions of bibliographic items such as books, articles, films, broadcasts, songs, etc.) or pointers to them may appear at various places in a TEI text. They are required at several points within the TEI Header's source description, as discussed in section 2.2.7 The Source Description; they may also appear within the body of a text, either singly (for example within a footnote), or collected together in a list as a distinct part of a text; detailed bibliographic descriptions of manuscript or other source materials may also be required. These Guidelines propose a number of specialised elements to encode such descriptions, which together constitute the model.biblLike class. By default, this class has the following members:
  • bibl (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
  • biblStruct (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order.
  • biblFull (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.
Lists of such elements may also be encoded using the following element:
  • listBibl (citation list) contains a list of bibliographic citations of any kind.

In printed texts, the individual constituents of a bibliographic reference are conventionally marked off from each other and from the flow of text by such features as bracketing, italics, special punctuation conventions, underlining, etc. In electronic texts, such distinctions are also important, whether in order to produce acceptably formatted output or to facilitate intelligent retrieval processing,14 quite apart from the need to distinguish the reference itself as a textual object with particular linguistic properties.

It should be emphasized that for references as for other textual features, the primary or sole consideration is not how the text should be formatted when it is printed. The distinctions permitted by the scheme outlined here may not necessarily be all that particular formatters or bibliographic styles require, although they should prove adequate to the needs of many such commonly used software systems.15 The features distinguished and described below (in section 3.11.2 Components of Bibliographic References) constitute a set which has been useful for a wide range of bibliographic purposes and in many applications, and which moreover corresponds to a great extent with existing bibliographic and library cataloguing practice. For a fuller account of that practice as applied to electronic texts see section 2.2.7 The Source Description; for a brief mention of related library standards see section 2.7 Note for Library Cataloguers.

3.11.1 Elements of Bibliographic References

The members of the model.biblLike class all share a number of possible component sub-elements. For the <bibl> and <biblStruct> elements, exactly the same sub-elements are concerned, and they are described together in section 3.11.2 Components of Bibliographic References; for the <biblFull> element, the sub-elements concerned are fully described in section 2.2 The File Description.

Different levels of specific tagging may be appropriate in different situations. In some cases, it may be felt necessary to mark just the extent of the reference itself, with perhaps a few distinctions being made within it (for example, between the part of the reference which identifies a title or author and the rest). Such references, containing a mixture of text with specialized bibliographic elements, are regarded as <bibl> elements, and tagged accordingly. For example:
<p>A book which had a great influence on him
was <bibl>Tufte's <title>Envisioning
     Information</title>
 </bibl>, although he may
never have actually read it.</p>
Indeed, some encoders may find it unnecessary to mark the bibliographic reference at all:
<p>A book which had a great influence on him
was Tufte's <title>Envisioning Information</title>,
although he may never have actually read it.</p>
Some bibliographic references are extremely elliptical, often only a string of the form Baxter, 1983. If no further details of Baxter's book are given in the source text and none are supplied by the encoder, then the reference thus given should be tagged as a <bibl>:
All of this is of course much more fully treated
in <bibl>Baxter, 1983</bibl>.
In general, however, normal modern bibliographic practice, and these Guidelines, distinguish between a bibliographic reference, which is a self-sufficient description of a bibliographic item, and a bibliographic pointer, which is a short-form citation (e.g. Baxter, 1983) which serves usually as a place-holder or pointer to a full long-form reference found elsewhere in the text. The usual encoding of short-form references such as Baxter, 1983 is not as <bibl> elements but as cross-references to such elements; see section 3.11.3 Bibliographic Pointers below.
In cases where the encoder wishes to impose more structure on the bibliographic information, for example to make sure it conforms to a particular stylesheet or retrieval processor, the <biblStruct> element should be used. Note that several of the features in this and later examples are explained later in the current section.
<biblStruct>
 <monogr>
  <author>Edward R. Tufte</author>
  <title>Envisioning Information</title>
  <imprint>
   <pubPlace>Cheshire, Conn.</pubPlace>
   <publisher>Graphics Press</publisher>
   <date>1990</date>
  </imprint>
 </monogr>
</biblStruct>
A more complex and detailed bibliographic structure is provided by the <biblFull> element defined in the TEI header module. This element is provided as a means of embedding the file description of one existing digital text within that of another (see further section 2.2 The File Description); however, its use is not confined to digital texts, and it may be used in the same way as any other bibliographic element, as in this example:
<biblFull>
 <titleStmt>
  <title>Envisioning Information</title>
  <author>Tufte, Edward R[olf]</author>
 </titleStmt>
 <extent>126 pp.</extent>
 <publicationStmt>
  <publisher>Graphics Press</publisher>
  <pubPlace>Cheshire, Conn. USA</pubPlace>
  <date>1990</date>
 </publicationStmt>
</biblFull>
A list of bibliographic items, of whatever kind, may be treated in the same way as any other list (see section 3.7 Lists). Alternatively, the specialized <listBibl> element may be used. The difference between the two is that a <list> contains <item> elements, within which bibliographic elements (<bibl>, <biblStruct>, or <biblFull>) may appear, as well as other phrase- and paragraph-level elements, whereas the <listBibl> may contain only bibliographic elements, optionally preceded by a heading and a series of introductory paragraphs. The former would be appropriate for a list of bibliographic elements in which descriptive prose predominated, and the latter for a more formal bibliography. The following are thus both legal encodings of a list of bibliographic entries: a <listBibl>:
<listBibl>
 <head>Bibliography</head>
 <biblStruct xml:id="NELSON80">
  <analytic>
   <author>Nelson, T. H.</author>
   <title>Replacing the printed word:
       a complete literary system.</title>
  </analytic>
  <monogr>
   <title>Information Processing '80: Proceedings of the IFIPS
       Congress, October 1980</title>
   <editor>Simon H. Lavington</editor>
   <imprint>
    <publisher>North-Holland</publisher>
    <pubPlace>Amsterdam</pubPlace>
    <date>1980</date>
   </imprint>
   <biblScope>pp 1013–23 </biblScope>
  </monogr>
  <note>Apparently a draft of section 4 of
  <title>Literary Machines</title>.</note>
 </biblStruct>
 <bibl xml:id="NELSON88">Ted Nelson: <title>Literary Machines</title>
   (privately published, 1987)</bibl>
 <bibl xml:id="BAXTER88">
  <author>Baxter, Glen</author>
  <title>Glen Baxter His Life: the years of struggle</title>
   London: Thames and Hudson, 1988.
 </bibl>
</listBibl>
or a simple <list>:
<list>
 <head>Bibliography</head>
 <item>
  <bibl xml:id="NEL80">
   <author>Nelson, T. H.</author>
   <title level="a">Replacing the printed word:
       a complete literary system.</title>
   <title level="m">Information Processing '80:
       Proceedings of the IFIPS Congress, October 1980</title>
   <editor>Simon H. Lavington</editor>
   <publisher>North-Holland</publisher>
   <pubPlace>Amsterdam</pubPlace>
   <date>1980</date>
   <biblScope>pp 1013–23
   </biblScope>
   <note>Apparently a draft of section 4 of
   <title>Literary Machines</title>.</note>
  </bibl>
 </item>
 <item>
  <bibl xml:id="NEL88">Ted Nelson: <title>Literary Machines</title>
     (privately published, 1987)</bibl>
 </item>
 <item>
  <bibl xml:id="BAX88">
   <author>Baxter, Glen</author>
   <title>Glen Baxter His Life: the years of struggle</title>
     London: Thames and Hudson, 1988.
  </bibl>
 </item>
</list>

3.11.2 Components of Bibliographic References

This section discusses a number of very commonly occurring component elements of bibliographic references. They fall into four groups:
  • elements for grouping components of the analytic, monographic, and series levels in a structured bibliographic reference
  • titles of various kinds, and statements of intellectual responsibility (authorship, etc.)
  • information relating to the publication, pagination, etc. of an item (most of these constitute the default members of the model.biblPart class)
  • annotation, commentary, and further detail
The following sections describe the elements which may be used to represent such information within a <bibl> or <biblStruct> element. Within the former, elements from the model.biblPart class, other phrase-level elements, and plain text may be combined without other constraint; within the latter, such of these elements as exist for a given reference must be distinguished, and must also be presented in a specific order, discussed further below (section 3.11.2.7 Order of Components within References).
3.11.2.1 Analytic, Monographic, and Series Levels
In common library practice a clear distinction is made between an individual item within a larger collection and a free-standing book, journal, or collection. Similarly a book in a series is distinguished sharply from the series within which it appears. An article forming part of a collection which itself appears in a series thus has a bibliographic description with three quite distinct levels of information:
  1. the analytic level, giving the title, author, etc., of the article;
  2. the monographic level, giving the title, editor, etc., of the collection;
  3. the series level, giving the title of the series, possibly the names of its editors, etc., and the number of the volume within that series.
In the same way, an article in a journal requires at least two levels of information: the analytic level describing the article itself, and the monographic level describing the journal.
These three levels may be distinguished within a <bibl> element, and must be distinguished within a <biblStruct> element if present, by means of the following elements:
  • analytic (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication.
  • monogr (monographic level) contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object).
  • series (series information) contains information about the series in which a book or other bibliographic item has appeared.

For purposes of TEI encoding, journals and anthologies are both treated as monographs; a journal title will thus be tagged as a title level="j" element, or simply as a <title> within a <monogr> element. Individual articles in the journal or collected texts should be treated at the ‘analytic’ level. When an article has been printed in more than one journal or collection, the bibliographic reference may have more than one <monogr> element, each possibly followed by one or more <series> elements. A <series> element always relates to the most recently preceding <monogr> element. (Whether reprints of an article are treated in the same bibliographic reference or a separate one varies among different styles. Library lists typically use a different entry for each publication, while academic footnoting practice typically treats all publications of the same article in a single entry.)

For example, the article cited in this example has been published twice, once in a journal and once in a collection which appeared in a German language series:
<biblStruct>
 <analytic>
  <author>Thaller, Manfred</author>
  <title level="a">A Draft Proposal for a Standard for the
     Coding of Machine Readable Sources</title>
 </analytic>
 <monogr>
  <title level="j">Historical Social Research</title>
  <imprint>
   <biblScope type="vol">40</biblScope>
   <date>October 1986</date>
   <biblScope type="pages">3-46</biblScope>
  </imprint>
 </monogr>
 <monogr>
  <title level="m">Modelling Historical Data:
     Towards a Standard for Encoding and
     Exchanging Machine-Readable Texts</title>
  <editor>Daniel I. Greenstein</editor>
  <imprint>
   <pubPlace>St. Katharinen</pubPlace>
   <publisher>Max-Planck-Institut für Geschichte
       In Kommission bei
       Scripta Mercaturae Verlag</publisher>
   <date>1991</date>
  </imprint>
 </monogr>
 <series xml:lang="de">
  <title level="s">Halbgraue Reihe
     zur Historischen Fachinformatik</title>
  <respStmt>
   <resp>Herausgegeben von</resp>
   <name type="person">Manfred Thaller</name>
   <name type="org">Max-Planck-Institut für Geschichte</name>
  </respStmt>
  <title level="s">Serie A: Historische Quellenkunden</title>
  <biblScope>Band 11</biblScope>
 </series>
</biblStruct>

The practice of analytic vs. monographic citation, as described here, should be distinguished from the practice of including within one citation a reference to another work, which the encoder considers to be related to in some way: see further 3.11.2.5 Related items below.

Punctuation should not appear between the elements within a structured bibliographic entry, unless it is contained within the elements it delimits. As the example shows, it is possible to encode the entry without any inter-element punctuation: this facilitates use of the <biblStruct> element in systems which can render bibliographic references in any of several styles.

3.11.2.2 Authors, Titles, and Editors
Bibliographic references typically begin with a statement of the title being cited followed by the names of those intellectually responsible for it. For articles in journals or collections, such statements should appear both for the analytic and for the monographic level. The following elements are provided for tagging such elements:
  • title contains the full title of a work of any kind.
  • author in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item.
  • editor secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
  • respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
  • resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
  • name (name, proper noun) contains a proper noun or noun phrase.
  • meeting contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it.
The elements <author>, <editor>, and <respStmt> are the default members of the model.respLike class, a subclass of the model.biblPart class to which the constituents of the <bibl> element belong.
In bibliographic references, all titles should be tagged as such, whether analytic, monographic, or series titles. The single element <title> is used for all these cases. When it appears directly within an <analytic>, <monogr>, or <series> element, <title> is interpreted as belonging to the appropriate level. When it appears elsewhere, its level attribute should be used to signal its bibliographic level. It is a semantic error to give a value for the level attribute which is inconsistent with the context; such values may be ignored. The level value a implies the analytic level; the values m, j, and u imply the monographic level; the value s implies the series level. Note, however, that the semantic error occurs only if the nested title is directly enclosed by the <analytic>, <monogr>, or <series> element; if it is enclosed only indirectly (i.e., nested more deeply), no semantic error need be present. For example, the analytic title may contain a monographic title:
<biblStruct>
 <analytic>
  <author>Lucy Allen Paton</author>
  <title>Notes on Manuscripts of the
  <title level="mxml:lang="fr">Prophécies de Merlin</title>
  </title>
 </analytic>
 <monogr>
  <title level="j">PMLA</title>
  <imprint>
   <biblScope type="vol">8</biblScope>
   <date>1913</date>
   <biblScope type="pages">122</biblScope>
  </imprint>
 </monogr>
</biblStruct>
In this case, the analytic title ‘Notes on Manuscripts of the Prophécies de Merlin’ needs no level attribute because it is directly contained by the <analytic> level; the monographic title contained within it, ‘Prophécies de Merlin,’ does not create a semantic error because it is not directly contained by the <analytic> element.

In some bibliographic applications, it may prove useful to distinguish main titles from subordinate titles, parallel titles, etc. The type attribute is provided to allow this distinction to be recorded.

The following reference, from a national standard for bibliographic references, illustrates this type of analysis with its distinction between main and subordinate titles. Note that this uses the more flexible <bibl>, rather than the structured <biblStruct> element: consequently, there is no requirement to tag all the components of the reference (notably the authors).
<bibl>Saarikoski, Pirkko-Liisa, and Paavo Suomalainen,
<title level="atype="main">Studies on the physiology of
   the hibernating hedgehog, 15</title>
 <title level="atype="subordinate">Effects of seasonal
   and temperature changes on the in vitro glycerol release from
   brown adipose tissue</title>
 <title level="j">Ann. Acad. Sci. Fenn., Ser. A4</title>
 <date>1972</date>
 <biblScope type="vol">187</biblScope>
 <biblScope type="pp">1-4</biblScope>
</bibl>
Slightly more complex is the distinction made below among main, subordinate, and parallel titles, in an example from the same source (p. 63). The punctuation and the bibliographic analysis are those given in ANSI Z39.29-1977; the punctuation is in the style prescribed by the International Standard Bibliographic Description (ISBD).16 Again, it is only because this example uses <bibl> rather than <biblStruct>, that specific punctuation may be included between the component elements of the reference.
<bibl>Tchaikovsky, Peter Ilich.
<title level="mtype="main">The swan lake ballet</title>
= <title level="mtype="parallelxml:lang="fr">Le lac des cygnes</title>
: <title level="mtype="subordinatexml:lang="fr">grand ballet en 4 actes</title>
: <title level="mtype="subordinate">op. 20</title>
[Score].
New York: Broude Brothers; [1951] (B.B. 59). vi, 685 p.</bibl>

The elements <author> and <editor> have, for printed books and articles, a fairly obvious significance; for other kinds of bibliographic items their proper usage may be less obvious. The <author> element should be used for the person or agency with primary responsibility for a work's intellectual content, and the element <editor> for an editor of the work. Thus an organization such as a radio or television station is usually accounted ‘author’ of a broadcast, for example, while the author of a Government report will usually be the agency which produced it.

For anyone else with responsibility for the work, the <respStmt> element should be used. The nature of the responsibility is indicated by means of a <resp> element, and the person, organization, etc. responsible by a <name>, <persName>, or <orgName> element. Strings such as ‘unknown’ may be encoded using the <rs> element. At least one of the four naming elements (<name>, <persName>, <orgName>, or <rs>) and one <resp> element should be given within the <respStmt> element, followed optionally by any number of any of them.

Examples of secondary responsibility of this kind include the roles of illustrator, translator, encoder, and annotator. The <respStmt> element may also be used for editors, if it is desired to record the specific terms in which their role is described.

Examples of <author> and <editor> may be found in sections 3.11.1 Elements of Bibliographic References, and 3.11.2.1 Analytic, Monographic, and Series Levels; wherever <author> and <editor> may occur, the <respStmt> element may also occur. When one of these elements precedes or immediately follows a title, it applies to that title; when it follows an <edition> element or occurs within an edition statement, it applies to the edition in question.

In this example, the <respStmt> elements apply to the work as a whole, not merely to the first edition:
<bibl>
 <author>Lominadze, D. G.</author>
 <title level="m">Cyclotron waves in plasma.</title>
 <respStmt>
  <resp>translated by</resp>
  <name>A. N. Dellis;</name>
 </respStmt>
 <respStmt>
  <resp>edited by</resp>
  <name>S. M. Hamberger.</name>
 </respStmt>
 <edition>1st ed.</edition>
 <pubPlace>Oxford:</pubPlace>
 <publisher>Pergamon Press,</publisher>
 <date>1981.</date>
 <extent>206 p.</extent>
 <title level="s">International series in natural philosophy.</title>
 <note place="inline">Translation of:
 <title xml:lang="rulevel="m">Ciklotronnye volny v plazme.</title>
 </note>
</bibl>
In this example, by contrast, the <respStmt> element applies to the edition, and not to the collection per se (Moser and Tervooren were not responsible for the first thirty-five printings); the elements of the reference have been reordered from their appearance on the title page of the volume in order to ensure the correct relationship of the collection title, the edition statement, and the statement of responsibility.
<biblStruct>
 <monogr xml:lang="de">
  <title>Des Minnesangs Frühling</title>
  <note place="inline">Mit 1 Faksimile</note>
  <edition>36., neugestaltete und erweiterte Auflage</edition>
  <respStmt>
   <resp>Unter Benutzung der Ausgaben von <name>Karl
         Lachmann</name> und <name>Moriz Haupt</name>, <name>Friedrich
         Vogt</name> und <name>Carl von Kraus</name> bearbeitet von</resp>
   <name>Hugo Moser</name>
   <name>Helmut Tervooren</name>
  </respStmt>
  <imprint>
   <biblScope type="volume">I Texte</biblScope>
   <pubPlace>Stuttgart</pubPlace>
   <publisher>S. Hirzel Verlag</publisher>
   <date>1977</date>
  </imprint>
 </monogr>
</biblStruct>
Another form of ‘responsibility’ arises when a work is published as the outcome of a conference, workshop or similar meeting. The <meeting> element may be used to supply this information, as in the following example:
<biblStruct>
 <monogr>
  <title>Proceedings of a workshop on corpus resources</title>
  <respStmt>
   <resp>Programme Organizer</resp>
   <name>Geoffrey Leech</name>
  </respStmt>
  <meeting>DTI Speech and Language Technology Club meeting, 3-4
     January 1990, Wadham College, Oxford</meeting>
 </monogr>
</biblStruct>
3.11.2.3 Imprint, Pagination, and Other Details
By imprint is meant all the information relating to the publication of a work: the person or organization by whose authority and in whose name a bibliographic entity such as a book is made public or distributed (whether a commercial publisher or some other organization), the place of publication, and a date. It may also include a full address for the publisher or organization. Full bibliographic references usually specify either the number of pages in a print publication (or equivalent information for non-print materials), or the specific location of the material being cited within its containing publication. The following elements are provided to hold this information:
  • imprint groups information relating to the publication or distribution of a bibliographic item.
  • address contains a postal address, for example of a publisher, an organization, or an individual.
  • pubPlace (publication place) contains the name of the place where a bibliographic item was published.
  • publisher provides the name of the organization responsible for the publication or distribution of a bibliographic item.
  • date contains a date in any format.
  • idno (identifying number) supplies any standard or non-standard number used to identify a bibliographic item.
  • extent describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.
  • biblScope (scope of citation) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work.
The elements <biblScope>, <pubPlace> and <publisher> constitute the special class model.imprintPart; members of this class may appear with a date inside an <imprint> element in a specific location within a <biblStruct>, or alternatively, they may appear alongside any other bibliographic component inside a <bibl>.

For bibliographic purposes, usually only the place (or places) of publication are required, possibly including the name of the country, rather than a full address; the element <pubPlace> is provided for this purpose. Where however the full postal address is likely to be of importance in identifying or locating the bibliographic item concerned, it may be supplied and tagged using the <address> element described in section 3.5.2 Addresses. Alternatively, if desired, the <rs> or <name> elements described in section 3.5.1 Referring Strings may be used; this involves no claim that the information given is either a full address or the name of a city.

The name of the publisher of an item should be marked using the <publisher> element even if the item is made public (‘published’) by an organization other than a conventional publisher, as is frequently the case with technical reports:
<biblStruct>
 <monogr>
  <author>Nicholas, Charles K.</author>
  <author>Welsch, Lawrence A.</author>
  <title>On the interchangeability of SGML and ODA</title>
  <imprint>
   <pubPlace>Gaithersburg, MD</pubPlace>
   <publisher> National Institute of Standards and Technology
   </publisher>
   <date when="1992-01">January 1992</date>
  </imprint>
  <extent>19 pp.</extent>
 </monogr>
 <idno type="NIST">NISTIR 4681</idno>
</biblStruct>
and with dissertations:
<biblStruct>
 <monogr>
  <author>Hansen, W.</author>
  <title level="u">Creation of hierarchic text
     with a computer display</title>
  <note place="inline">Ph.D. dissertation</note>
  <imprint>
   <publisher>Dept. of Computer Science, Stanford Univ.</publisher>
   <pubPlace>Stanford, CA</pubPlace>
   <date when="1971-06">June 1971</date>
  </imprint>
 </monogr>
</biblStruct>
When an item has been reprinted, especially reprinted without change from a specific earlier edition, the reprint may appear in a <monogr> element with only the <imprint> and other details of the reprint. In the following example, a microform reprint has been issued without any change in the title or authorship. The series statement here applies only to the second <monogr> element.
<biblStruct>
 <monogr>
  <author>Shirley, James</author>
  <title type="main">The gentlemen of Venice</title>
  <title type="subordinate">a tragi-comedie presented at the private
     house in Salisbury Court by Her Majesties servants</title>
  <note place="inline">[Microform]</note>
  <imprint>
   <pubPlace>London</pubPlace>
   <publisher>H. Moseley</publisher>
   <date>1655</date>
  </imprint>
  <extent>78 p.</extent>
 </monogr>
 <monogr>
  <imprint>
   <pubPlace>New York</pubPlace>
   <publisher>Readex Microprint</publisher>
   <date>1953</date>
  </imprint>
  <extent>1 microprint card, 23 x 15 cm.</extent>
 </monogr>
 <series>
  <title>Three centuries of drama: English, 1642–1700</title>
 </series>
</biblStruct>

An alternative way of handling the above situation would be to use the <relatedItem> element described in section 3.11.2.5 Related items below.

A bibliographic description, particularly for an analytic title, will often include some additional information specifying its location, for example as a volume number, page number, range of page numbers, or name or number of a subdivision of the host work. The element <biblScope> may be used to identify such information if it is present. Where it is desired to distinguish different classes of such information (volume number, page number, chapter number, etc.), the type attribute may be used with any convenient typology.

When the item being cited is a journal article, the <imprint> element describing the issue in which it appeared may contain <biblScope> elements for volume and page numbers, together with a <date> element.

For example:
<biblStruct>
 <analytic>
  <author>Wrigley, E. A.</author>
  <title>Parish registers and the historian</title>
 </analytic>
 <monogr>
  <editor>Steel, D. J.</editor>
  <title>National index of parish registers</title>
  <imprint>
   <pubPlace>London</pubPlace>
   <publisher>Society of Genealogists</publisher>
   <date when="1968">1968</date>
   <biblScope type="vol">vol. 1</biblScope>
   <biblScope type="pp">pp. 155–167.</biblScope>
  </imprint>
 </monogr>
</biblStruct>
The type attribute on <biblScope> is optional: both the following are legal examples:
<biblStruct>
 <analytic>
  <author>Boguraev, Branimir</author>
  <author>Neff, Mary</author>
  <title>Text Representation, Dictionary Structure,
     and Lexical Knowledge</title>
 </analytic>
 <monogr>
  <title level="j">Literary &amp; Linguistic Computing</title>
  <imprint>
   <biblScope type="vol">7</biblScope>
   <biblScope type="issue">2</biblScope>
   <date>1992</date>
   <biblScope type="pp">110-112</biblScope>
  </imprint>
 </monogr>
</biblStruct>
<biblStruct>
 <analytic>
  <author>Chesnutt, David</author>
  <title>Historical Editions in the States</title>
 </analytic>
 <monogr>
  <title level="j">Computers and the Humanities</title>
  <imprint>
   <biblScope>25.6</biblScope>
   <date when="1991-12">(December, 1991):</date>
   <biblScope>377–380</biblScope>
  </imprint>
 </monogr>
</biblStruct>
3.11.2.4 Series Information

Series information may (in <bibl> elements) or must (in <biblStruct> elements) be enclosed in a <series> element or (in a <biblFull> element) a <seriesStmt> element. The title of the series may be tagged title level="s", the volume number biblScope type="vol", and responsibility statements for the series (e.g. the name and affiliation of the editor, as in the example in section 3.11.2.1 Analytic, Monographic, and Series Levels) may be tagged <editor> or <respStmt>.

3.11.2.5 Related items

In bibliographic parlance, a related item is any bibliographic item which, though related to that being defined, is distinct from it. The distinction between analytic and monographic items made above may be thought of as a special case of this kind of ‘related’ item. More usually however, the term is applied to such items as translations, continuations, original sources, parts, etc.

The element <relatedItem> is provided as a means of documenting such associated items:
  • relatedItem contains or references some other bibliographic item which is related to the present one in some specified manner, for example as a constituent or alternative version of it.
In the following example, the first <biblStruct> describes a facsimile edition, and the second describes the work of which it is a facsimile. The relation between the facsimile and its source is represented by means of a <relatedItem> within the first description, which points to the description of the source.
<biblStruct xml:id="bibl03">
 <monogr>
  <author>Swinburne, Algernon Charles</author>
  <title>Swinburne's <title>Atalanta in Calydon</title>: A Facsimile of the
     First Edition</title>
  <editor>Georges Lafourcade</editor>
  <imprint>
   <pubPlace>London</pubPlace>
   <publisher>Oxford UP</publisher>
   <date>1930</date>
  </imprint>
 </monogr>
 <relatedItem type="original">
  <ref target="#bibl04"/>
 </relatedItem>
</biblStruct>
<biblStruct xml:id="bibl04">
 <monogr>
  <author> Swinburne, Algernon Charles</author>
  <title>Atalanta in Calydon</title>
  <imprint>
   <pubPlace>London</pubPlace>
   <publisher>Edward Moxon</publisher>
   <date>1865</date>
  </imprint>
 </monogr>
</biblStruct>
The <ref> element in the above example could be replaced by the referenced <biblStruct> itself since a <relatedItem> may contain any form of bibliographic reference. For example, one of the examples quoted above might also be encoded as follows:
<biblStruct>
 <monogr>
  <author>Shirley, James</author>
  <title type="main">The gentlemen of Venice</title>
  <imprint>
   <pubPlace>New York</pubPlace>
   <publisher>Readex Microprint</publisher>
   <date>1953</date>
  </imprint>
  <extent>1 microprint card, 23 x 15 cm.</extent>
 </monogr>
 <series>
  <title>Three centuries of drama: English, 1642–1700</title>
 </series>
 <relatedItem type="original">
  <biblStruct>
   <monogr>
    <author>Shirley, James</author>
    <title type="main">The gentlemen of Venice</title>
    <title type="subordinate">a tragi-comedie presented at the private
         house in Salisbury Court by Her Majesties servants</title>
    <imprint>
     <pubPlace>London</pubPlace>
     <publisher>H. Moseley</publisher>
     <date>1655</date>
    </imprint>
    <extent>78 p.</extent>
   </monogr>
  </biblStruct>
 </relatedItem>
</biblStruct>
3.11.2.6 Notes and Other Additional Information
Explanatory notes about the publication of unusual items, the form of an item (e.g. [Score] or [Microform]), or its provenance (e.g. translation of ...) may be tagged using the <note> element. The same element may be used for any descriptive annotation of a bibliographic entry in a database.
  • note contains a note or annotation.
For example:
<bibl>
 <author>Coombs, James H., Allen H. Renear,
   and Steven J. DeRose.</author>
 <title level="a">Markup Systems and the Future of Scholarly
   Text Processing.</title>
 <title level="j">Communications of the ACM</title>
 <biblScope>30.11 (November 1987): 933–947.</biblScope>
 <note>Classic polemic supporting descriptive over procedural
   markup in scholarly work.</note>
</bibl>
3.11.2.7 Order of Components within References

The order of elements in <bibl> elements is not constrained.

In <biblStruct> elements, the <analytic> element, if it occurs, must come first, followed by one or more <monogr> and <series> elements, which may appear intermingled (as long as a <monogr> element comes first). Within <analytic>, the title(s), author(s), editor(s), and other statements of responsibility may appear in any order; it is recommended that all forms of the title be given together. Within <monogr>, the author, editor, and statements of responsibility may either come first or else follow the monographic title(s). Following these, the elements must appear in the following order:
  • <note>s on the publication (and <meeting> elements describing the conference, in the case of a proceedings volume)
  • <edition> elements, each followed by any related <editor> or <respStmt> elements
  • <imprint>
  • <biblScope>
Within <imprint>, the elements allowed may appear in any order.

Finally, within the <series> information in a <biblStruct>, the sequence of elements is not constrained.

If more detailed structuring of a bibliographic description is required, the <biblFull> element should be used. This is not further described here, as its contents are essentially equivalent to those of the <fileDesc> element in the <teiHeader>, which is fully described in section 2.2 The File Description.

3.11.3 Bibliographic Pointers

References which are pointers to bibliographic items, of whatever kind, should be treated in the same way as other cross-references (see section 3.6 Simple Links and Cross-References). As discussed in that section, cross-referencing within TEI texts is in general represented by means of <ptr> or <ref> elements. A target attribute on these elements is used to supply an identifying value for the target of the cross-reference, which should be, in the case of bibliographic elements, a bibliographic reference of some kind. Where the form of the reference itself is unimportant, or may be reconstructed mechanically, or is not to be encoded, the <ptr> element is used, as in the following example:
As shown above (<ptr target="#NEL80"/>) ...
Where the form of the reference is important, or contains additional qualifying information which is to be kept but distinguished from the surrounding text, the <ref> element should be used, as in the following example:
Nelson claims <ref target="#NEL80">(ibid, passim)</ref> ...
It may be important to distinguish between the short form of a bibliographic reference and some qualifying or additional information. The latter should not appear within the scope of the <ref> element when this is the case, as for example in an application concerned to normalize bibliographic references:
Nelson claims (<ref target="#NEL80">Nelson [1980]</ref> pages 13–37) ...

3.11.4 Relationship to Other Bibliographic Schemes

The bibliographic tagging defined here can capture the distinctions required by most bibliographic encoding systems; for the benefit of users of some commonly used systems, the following lists of equivalences are offered, showing the relationship of the markup defined here to the fields defined for bibliographic records in the Scribe, BibTeX, and ProCite systems.

Listed below are the equivalences between the various bibliographic fields defined for use in the Scribe and BibTeX systems of bibliographic databases and the elements defined in this module.17 Elements and structures available in the module defined here which have no analogues in Scribe and BibTeX are not noted.
address
tag as <placeName> or <address>
annote
tag as <note>
author
tag as <author>
booktitle
tag as title level="m" or <title> within <monogr>
chapter
tag as biblScope type="chapter"
date
used only to record date entry was made in the bibliographic database; not supported
edition
tag as <edition>
editor
tag as <editor> or <respStmt>
editors
tag as multiple <editor> or <respStmt> elements
fullauthor
use the <reg> element, possibly inside a <choice> element, inside either an <author> or <name>
fullorganization
use the <reg> element, possibly inside a <choice> element, inside a name type="org"
howpublished
tag as <note>, possibly using the form note place="inline"
institution
used only for issuer of technical reports; tag as <publisher>
journal
tag as title level="j" or <title> within <monogr>
key
used to specify an alternate sort key for the bibliographic item, for use instead of author's or editor's name; not supported
meeting
tag as <meeting> or as <note>
month
use <date>; if the date is not in a trivially parseable form, use the when attribute to provide a normalized equivalent in one of the format from XML Schema Part 2: Datatypes Second Edition
note
tag as <note>
number
tag as biblScope type="issue" or biblScope type="number"; for technical report numbers, use idno type="docno"
organization
used only for sponsor of conference; use name type="org" within <respStmt> within <meeting> element
pages
tag as biblScope type="pp"
publisher
tag as <publisher>
school
used only for institutions at which thesis work is done; tag as <publisher>
series
tag as title level="s" or <title> within <series>
title
tag as <title> in appropriate context or with appropriate level value
volume
tag as biblScope type="vol"
year
tag as <date>; if the date is not in a trivially parseable form, use the when attribute to provide an ISO-format equivalent

3.12 Passages of Verse or Drama

The following elements are included in the core module for the convenience of those encoding texts which include mixtures of prose, verse and drama.
  • l (verse line) contains a single, possibly incomplete, line of verse.
  • lg (line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
  • sp (speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text.
  • speaker A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
  • stage (stage direction) contains any kind of stage direction within a dramatic text or fragment.

Full details of other, more specialized, elements for the encoding of texts which are predominantly verse or drama are described in the appropriate chapter of part three (for verse, see the verse base described in chapter 6 Verse; for performance texts, see the drama base described in chapter 7 Performance Texts). In this section, we describe only the elements listed above, all of which can appear in any text, whichever of the three modes prose, verse, or drama may predominate in it.

3.12.1 Core Tags for Verse

Like other written texts, verse texts or poems may be hierarchically subdivided, for example into books or cantos. These structural subdivisions should be encoded using the general purpose <div> or <div1> (etc.) elements described below in chapters 4 Default Text Structure and 6 Verse. The fundamental unit of a verse text is the verse line rather than the paragraph, however.

The <l> element is used to mark up verse lines, that is metrical rather than typographic lines. In some modern or free verse, it may be hard to decide whether the typographic line is to be regarded as a verse line or not, but the distinction is quite clear for verse following regular metrical patterns. Where a metrical line is interrupted by a typographic line break, the encoder may choose to ignore the fact entirely or to use the empty <lb> (line break) element discussed in 3.10 Reference Systems. By convention, the start of a metrical line implies the start of a typographic line; hence there is no need to introduce an <lb> tag at the start of every <l> element, but only at places where a new typographic line starts within a metrical line, as in the following example:
<l>Of Mans First Disobedience, and<lb/> the Fruit</l>
<l>Of that Forbidden Tree, whose<lb/> mortal tast</l>
<l>Brought Death into the World,<lb/> and all our woe,</l>
<l>With loss of Eden, till one greater Man</l>
<l>Restore us, and regain the blissful Seat...</l>
In the original copy text, the presence of an ornamental capital at the start of the poem means that the measure is not wide enough to print the first four lines on four lines; instead each metrical line occupies two typographic lines, with a break at the point indicated. Note that this encoding makes no attempt to preserve information about the whitespace or indentation associated with either kind of line; if regarded as essential, this information would be recorded using the rend or rendition attributes discussed in 1.3.1.1 Global Attributes.

The <l> element should not be used to represent typographic lines in non-verse materials: if the line-breaking points in a prose text are considered important for analysis, they should be marked with the <lb> element. Alternatively, a neutral segmentation element such as <seg> or <ab> may be used; see further discussion of these elements in chapter 16 Linking, Segmentation, and Alignment. The <l> element is a member of the model.lLike class, which is a subclass of the model.divPart class, along with elements from the model.pLike (paragraph-like) class.

In some verse forms, regular groupings of lines are regarded as units of some kind, often identified by a regular verse scheme. In stichic verse and couplets, groups of lines analogous to paragraphs are often indicated by indentation. In other verse forms, lines are grouped into irregular sequences indicated simply by whitespace. The <lg> or line group element may be used to mark any such grouping of elements from the model.lLike class. As a member of the att.typed class, the <lg> element bears the following attributes:
  • att.typed provides attributes which can be used to classify or subclassify elements in any way.
    typecharacterizes the element in some sense, using any convenient classification scheme or typology.
    subtypeprovides a sub-categorization of the element, if needed
which may be used to further categorize the line group where this is felt desirable, as in the following example. This example also demonstrates the rend attribute to indicate whether or not a line is indented.
<lg>
 <l>Come fill up the Glass,</l>
 <l rend="indent">Round, round let it pass,</l>
 <l>'Till our Reason be lost in our Wine:</l>
 <l rend="indent">Leave Conscience's Rules</l>
 <l rend="indent">To Women and Fools,</l>
 <l>This only can make us divine.</l>
</lg>
<lg n="Chorustype="refrain">
 <l>Then a Mohock, a Mohock I'll be,</l>
 <l>No Laws shall restrain</l>
 <l>Our Libertine Reign,</l>
 <l>We'll riot, drink on, and be free.</l>
</lg>
For some kinds of analysis, it may be useful to identify different kinds of line group within the same piece of verse. Such line groups may self-nest, in much the same way as the un-numbered <div> element described in chapter 4 Default Text Structure. For example:
<lg type="sonnet">
 <lg type="octet">
  <l>Thus speaks the Muse, and bends her brow severe:—</l>
  <l>“Did I, <name>Lætitia</name>, lend my choicest lays,</l>
  <l>And crown thy youthful head with freshest bays,</l>
  <l>That all the' expectance of thy full-grown year</l>
  <l>Should lie inert and fruitless? O revere</l>
  <l>Those sacred gifts whose meed is deathless praise,</l>
  <l>Whose potent charms the' enraptured soul can raise</l>
  <l>Far from the vapours of this earthly sphere!</l>
 </lg>
 <lg type="sestet">
  <l>Seize, seize the lyre! resume the lofty strain!</l>
  <l>'T is time, 't is time! hark how the nations round</l>
  <l>With jocund notes of liberty resound,—</l>
  <l>And thy own <name>Corsica</name> has burst her chain!</l>
  <l>O let the song to <name>Britain's</name> shores rebound,</l>
  <l rend="indent(-1)">Where Freedom's once-loved voice is heard,
     alas! in vain.”</l>
 </lg>
</lg>
It is often the case that verse line boundaries conflict with the boundaries of other structural elements. In the following example, the single verse line ‘A Workeman in't... welcome’ is interrupted by a stage direction:
<l>Thou fumblest <name>Eros</name>, and my Queenes a Squire</l>
<l>More tight at this, then thou: Dispatch. O Loue,</l>
<l>That thou couldst see my Warres to day, and knew'st</l>
<l>The Royall Occupation, thou should'st see</l>
<l part="I">A Workeman in't. <stage>Enter an Armed Soldier.</stage>
</l>
<l part="F">Good morrow to thee, welcome. </l>
In this encoding, the part attribute is used, as with <div>, to indicate that the last two <l> elements should be regarded as the initial and final parts of a single line, rather than as two lines.
The same technique may be used where verse lines are collected together into units such as verse paragraphs:
<lg n="6type="para">
<!-- ... -->
 <l>Unprofitably travelling toward the grave,</l>
 <l>Like a false steward who hath much received</l>
 <l part="I">And renders nothing back.</l>
</lg>
<lg type="paran="7">
 <l part="F">Was it for this</l>
 <l>That one, the fairest of all rivers, loved</l>
 <l>To blend his murmurs with my nurse's song,</l>
<!-- ... -->
</lg>
The part attribute may also be attached to an <lg> element to indicate that it is incomplete, for example because it forms part of a group that is divided between two speakers, as in the following example:
<sp>
 <speaker>First Voice</speaker>
 <lg type="stanzapart="I">
  <l>But why drives on that ship so fast</l>
  <l>Withouten wave or wind?</l>
 </lg>
</sp>
<sp>
 <speaker>Second Voice</speaker>
 <lg type="stanzapart="F">
  <l>The air is cut away before,</l>
  <l>And closes from behind.</l>
 </lg>
</sp>

For alternative methods of aligning groups of lines which do not form simple hierarchic groups, or which are discontinuous, see the more detailed discussion in chapter 16 Linking, Segmentation, and Alignment. For discussion of other elements and attributes specific to the encoding of verse, see chapter 6 Verse.

3.12.2 Core Tags for Drama

Like other written texts, dramatic and other performance texts such as cinema or TV scripts are often hierarchically organized, for example into acts and scenes. These structural subdivisions should be encoded using the general purpose <div> or <div1> (etc.) elements described below in chapters 4 Default Text Structure and 7 Performance Texts. Within these divisions, the body of a performance text typically consists of speeches, often prefixed by a phrase indicating who is speaking, and occasionally interspersed with stage directions of various kinds.

In the following simple example, each speech consists of a single paragraph:
<div2 n="I.2type="scene">
 <head>Scene 2.</head>
 <stage type="setting">Peachum, Filch.</stage>
 <sp>
  <speaker>FILCH.</speaker>
  <p>Sir, Black Moll hath sent word her Trial comes on in
     the Afternoon, and she hopes you will order Matters
     so as to bring her off.</p>
 </sp>
 <sp>
  <speaker>PEACHUM.</speaker>
  <p>Why, she may plead her Belly at worst; to my
     Knowledge she hath taken care of that Security.
     But, as the Wench is very active and industrious,
     you may satisfy her that I'll soften the Evidence.</p>
 </sp>
 <sp>
  <speaker>FILCH.</speaker>
  <p>Tom Gagg, sir, is found guilty.</p>
 </sp>
</div2>
In the following example, each speech consists of a sequence of verse lines, some of them being marked as metrically incomplete:
<div1 n="Itype="Act">
 <head>ACT I</head>
 <div2 n="1type="Scene">
  <head>SCENE I</head>
  <stage rend="italic">Enter Barnardo and Francisco,
     two Sentinels, at several doors</stage>
  <sp>
   <speaker>Barn</speaker>
   <l part="Y">Who's there?</l>
  </sp>
  <sp>
   <speaker>Fran</speaker>
   <l>Nay, answer me. Stand and unfold yourself.</l>
  </sp>
  <sp>
   <speaker>Barn</speaker>
   <l part="I">Long live the King!</l>
  </sp>
  <sp>
   <speaker>Fran</speaker>
   <l part="M">Barnardo?</l>
  </sp>
  <sp>
   <speaker>Barn</speaker>
   <l part="F">He.</l>
  </sp>
  <sp>
   <speaker>Fran</speaker>
   <l>You come most carefully upon your hour.</l>
  </sp>
  <sp>
   <speaker>Barn</speaker>
   <l>'Tis now struck twelve. Get thee to bed, Francisco.</l>
  </sp>
  <sp>
   <speaker>Fran</speaker>
   <l>For this relief much thanks. 'Tis bitter cold,</l>
   <l part="I">And I am sick at heart.</l>
  </sp>
 </div2>
</div1>
In some cases, as here in the First Quarto of Hamlet, the printed speaker attributions need to be supplemented by use of the who attribute; again, the lines are marked as complete or incomplete:
<stage>Enter two Centinels.
<add place="right">Now call'd <name xml:id="barnardo">Bernardo</name> &amp;
 <name xml:id="francisco">Francesco</name>.</add>
</stage>
<sp who="#francisco">
 <speaker>1.</speaker>
 <l part="Y">Stand: who is that?</l>
</sp>
<sp who="#barnardo">
 <speaker>2.</speaker>
 <l part="Y">Tis I.</l>
</sp>
<sp who="#francisco">
 <speaker>1.</speaker>
 <l>O you come most carefully vpon your watch,</l>
</sp>
<sp who="#barnardo">
 <speaker>2.</speaker>
 <l>And if you meete Marcellus and Horatio,</l>
 <l>The partners of my watch, bid them make haste.</l>
</sp>
<sp who="#francisco">
 <speaker>1.</speaker>
 <l part="Y">I will: See who goes there.</l>
</sp>
<stage>Enter Horatio and Marcellus.</stage>
By contrast with the preceding examples, the following encodes an early printed edition without making any assumption about which parts are prose or verse:
<div1 n="Itype="act">
 <div2 n="1type="scene">
  <head rend="italic">Actus primus, Scena prima.</head>
  <stage rend="italictype="setting">A tempestuous
     noise of Thunder and Lightning heard: Enter
     a Ship-master, and a Boteswaine.</stage>
  <sp>
   <speaker>Master.</speaker>
   <p>Bote-swaine.</p>
  </sp>
  <sp>
   <speaker>Botes.</speaker>
   <p>Heere Master: What cheere?</p>
  </sp>
  <sp>
   <speaker>Mast.</speaker>
   <p>Good: Speake to th' Mariners: fall
       too't, yarely, or we run our selues a ground,
       bestirre, bestirre. <stage type="move">Exit.</stage>
   </p>
  </sp>
  <stage type="move">Enter Mariners.</stage>
  <sp>
   <speaker>Botes.</speaker>
   <p>Heigh my hearts, cheerely, cheerely my harts: yare,
       yare: Take in the toppe-sale: Tend to th' Masters whistle:
       Blow till thou burst thy winde, if roome e-nough.</p>
  </sp>
 </div2>
</div1>
The <sp> and <stage> elements should also be used to mark parts of a text otherwise in prose which are presented as if they were dialogue in a play. The following example is taken from a 19th century novel in which passages of narrative and passages of dialogue are mixed within the same chapter:
<sp>
 <speaker>The reverend Doctor Opimiam</speaker>
 <p>I do not think I have named a single unpresentable fish.</p>
</sp>
<sp>
 <speaker>Mr Gryll</speaker>
 <p>Bream, Doctor: there is not much to be said for bream.</p>
</sp>
<sp>
 <speaker>The Reverend Doctor Opimiam</speaker>
 <p>On the contrary, sir, I think there is much to be said for him.
   In the first place ...</p>
 <p>Fish, Miss Gryll — I could discourse to you on fish by the
   hour: but for the present I will forbear ...</p>
</sp>
<sp>
 <speaker>Lord Curryfin</speaker>
 <stage>(after a pause).</stage>
 <p>
  <q>Mass</q> as the second grave-digger says
   in <title>Hamlet</title>, <q>I cannot tell.</q>
 </p>
</sp>
<p>A chorus of laughter dissolved the sitting.</p>

3.13 Overview of the Core Module

All the elements described in this chapter are provided by the core module.
Module core: Elements common to all TEI documents
The selection and combination of modules to form a TEI schema is described in 1.2 Defining a TEI Schema.

Contents « 2 The TEI Header » 4 Default Text Structure

Notes
9.
Although the way in which a spoken text is performed, (for example, the voice quality, loudness, etc.) might be regarded as analogous to ‘highlighting’ in this sense, these Guidelines recommend distinct elements for the encoding of such ‘highlighting’ in spoken texts. See further section 8.3.6 Shifts.
10.
The Oxford English Dictionary documents the phrase to come down in the sense ‘to bring or put down; esp. to lay down money; to make a disbursement’ as being in use, mostly in colloquial or humorous contexts, from at least 1700 to the latter half of the 19th century.
11.
In some contexts, the term regularization has a narrower and more specific significance than that proposed here: the <reg> element may be used for any kind of regularization, including normalization, standardization, and modernization.
12.
The datatypes are taken from the W3C Recommendation XML Schema Part 2: Datatypes Second Edition. The permitted datatypes are: There is one exception: these Guidelines permit a time to be expressed as only a number of hours, or as a number of hours and minutes, as per ISO 8601:2004 section 4.2.2.3 and 4.3.3. The W3C time and dateTime datatypes require that the minutes and seconds be included in the normalized value if they are to be correctly processed for example when sorting.
13.
Many encoders find it convenient to retain the line breaks of the original during data entry, to simplify proofreading, but this may be done without inserting a tag for each line break of the original.
14.
For example, to distinguish London as an author's name from London as a place of publication or as a component of a title.
15.
Among the bibliographic software systems and subsystems consulted in the design of the <biblStruct> structure were BibTeX, Scribe, and ProCite. The distinctions made by all three may be preserved in <biblStruct> structures, though the nature of their design prevents a simple one-to-one mapping from their data elements to TEI elements. For further information, see section 3.11.4 Relationship to Other Bibliographic Schemes.
16.
The analysis is not wholly unproblematic: as the text of the standard points out, the first subordinate title is subordinate only to the parallel title in French, while the second is subordinate to both the English main title and the French parallel title, without this relationship being made clear, either in the markup given in the example or in the reference structure offered by the standard.
17.
The BibTeX scheme is intentionally compatible with that of Scribe, although it omits some fields used by Scribe. Hence only one list of fields is given here.

[English] [Deutsch] [Español] [Italiano] [Français] [日本語] [中文]



Copyright TEI Consortium 2007 Licensed under the GPL. Copying and redistribution is permitted and encouraged.
Version 1.0.1. Last updated on 3rd February 2008.This page generated on 2008-02-03T17:45:08Z