A TEI-conformant electronic text consists of the text itself (transcribed from some source, or created in electronic form), preceded by a TEI header, which identifies the electronic text and can also document the encoding practices used in creating it. The entire thing is enclosed within a tei.2 element, and preceded by an SGML declaration identifying the document type to be used in validating the document.
The SGML declaration won't be described here. Further below, I'll discuss the TEI header, and the specialized tags for front matter and back matter of the main text. In work with electronic text, however, the vast majority of one's time is spent within the body of the text itself, and so I begin with a description of tags for basic text encoding: paragraphs and other paragraph-like things, character- or phrase-level elements which occur within paragraphs, and so on.
Mark paragraphs with the tag p. Paragraphs do not nest, and neither may p elements. For example:
<p>I call specific attention to the authority given by the 21st Amendment to the Constitution to prohibit transportation or importation of intoxicating liquors into any State in violation of the laws of such State.</p> <p>I ask the wholehearted cooperation of all our citizens to the end that this return of individual freedom shall not be accompanied by the repugnant conditions that obtained prior to the adoption of the 18th Amendment and those that have existed since its adoption. Failure to do this honestly and courageously will be a living reproach to us all.</p> <p>I ask especially that no State shall by law or otherwise authorize the return of the saloon either in its old form or in some modern guise. </p>
Phrases which are highlighted in the source (or should be highlighted in the output), whether by italics, boldface, small caps, or other special treatment, should be tagged with the hi element. The rend attribute may optionally say how the phrase was highlighted. In the example below, the word whereas and the phrase therefore, I, Franklin D. Roosevelt are printed in small caps in the source:
<p><hi rend='sc'>Whereas</hi> the Congress of the United States ... </p> <p><hi rend='sc'>Whereas</hi> Section 217(a) of the Act of Congress entitled "An Act ..." ...</p> <p><hi rend='sc'>Whereas</hi> it appears ... </p> <p>Now, <hi rend='sc'>therefore, I, Franklin D. Roosevelt</hi>, President of the United States of America ... do hereby proclaim that the Eighteenth Amendment to the Constitution of the United States was repealed on the fifth day of December, 1933.</p>
The rend attribute may be omitted if the rendering is of no interest, or if all highlighted phrases are rendered the same way. Its values may be chosen arbitrarily by the encoder --- the values used may then be used in turn to direct processing software to display or process the element correctly.
Mark quotations from other works, or dialog spoken by characters in a narrative, as q (quotation) elements:
<p><hi rend='sc'>Whereas</hi> Section 217(a) of the Act of Congress entitled "An Act ..." approved June 16, 1933, provides as follows: <q>Section 217(a) The President shall proclaim the ... </q></p>
Block quotations and inline quotations are distinguished only by the value of their rend attribute; for the former, use the value "block" or "display", for the latter, use "inline".
References to other documents, or to other locations in the current document, should be tagged with the ref tag:
WHEREAS <ref>Section 217(a) of the Act of Congress ... approved June 16, 1933</ref>, provides as follows: ...
For cross references within the same SGML document, the target attribute may be used to indicate which section is being referred to; its value is the id value assigned to some element in the document. For example, the following cross reference:
I there expressed the hope, and asked for united cooperation, that this return of individual freedom would not be accompanied by anti-social conditions, such as the saloon and the other evils of the pre-prohibition era. (See also <ref target='pc1993-10-11'>Press Conference of October 11, 1933, Item 137, this volume</ref>.)
assumes the existence of some element elsewhere in the volume with the identifier given:
<div id='pc1933-10-11'> <head>Press Conference, 11 October 1933</head> <!-- ... --> </div>
The div and head used in the example just given elements are described below.
If the page breaks of the source are of interest, as they generally are for material transcribed from existing printed editions, record them using the pb element. This element is empty: that is, it has neither content nor an end-tag. It does not mark a passage or portion of the text, just a location within the text. The attribute n, defined for all TEI elements, should be used to indicate the page number; if page numbers from more than one edition are transcribed, the attribute ed should be used to distinguish the two paginations:
<p>I ask the wholehearted cooperation of all our citizens to the end that this return of individual freedom shall not be accompanied by the repugnant conditions that obtained prior to the <pb n='512' ed='1938'> adoption of the 18th Amendment and those that have existed since its adoption....</p>
Individual verse lines should be tagged with l
(that's an "L"), stanzas or other verse
structures above the level of the line should be tagged
lg (
<lg type='quatrain'> <l>Awake! for Morning in the Bowl of Night</l> <l>Has flung the Stone that puts the Stars to Flight:</l> <l>And Lo! the Hunter of the East has caught</l> <l>The Sultan's Turret in a Noose of Light.</l> </lg>
When the indentation of the lines is significant, it can be recorded using the global rend attribute, with some suitable value:
<l rend='indent'>And Lo! the Hunter of the East has caught</l> <l>The Sultan's Turret in a Noose of Light.</l>
Of course, if the verse is quoted from another text, the l elements should be enclosed in a q element.
Drama should be encoded with the elements sp
(
<sp who='Casca'> <l>Speak, hands, for me!</l></sp> <stage>They stab Caesar.</stage> <sp who='Julius Caesar'> <l>Et tu, Brute? -- then fall, Caesar!</l> <stage>Dies.</stage></sp>
When the precise form of the speaker atribution in the source is important, the speaker may be identified by a separate speaker element at the beginning of the sp element.
<sp><speaker>Cas.</speaker> <l>Speak, hands, for me!</l></sp> <stage>They stab Caesar.</stage> <sp><speaker>Caes.</speaker> <l>Et tu, Brute? -- then fall, Caesar!</l> <stage>Dies.</stage></sp>
These tags may also be used for material not written as drama, but presented using dramatic conventions (e.g. transcriptions of speeches, or of press conferences):
The brave men living and dead who struggled here have consecrated it far above our power to add or detract. <stage>[Applause.]</stage> <!-- ... --> and that Governments of the people, by the people, and for the people, shall not perish from the earth. <stage>[Long-continued applause.] </stage>
As with verse, if the drama is quoted from another text, it should be enclosed in a q element.
Bibliographic references should normally be enclosed in
bibl elements; within such elements, or outside
them, title may be used to mark titles of
articles, books, journals, etc. Its level
attribute takes the values A,
M, J,
S, or U to show whether
the title is an analytic (article) title, a monogrphic (book) title, the
title of a journal, that of a series, or that of unpublished material
such as a thesis. For example a reference to:
<bibl> <title level='A'>Inaugural Address, March 4, 1933</title>, in <title level='M'>The Public Papers and Addresses of Franklin D. Roosevelt </title>, vol. II (New York: Random House, 1938), pp. 11-16. </bibl>
If material has been omitted from an electronic text (e.g. because it is illegible or not of interest to the expected users, the omission should normally be indicated using a gap element at the point of omission. The attributes desc, reason, and extent may optionally be used to describe what was omitted, to explain why, and to give an approximate size for it. For example:
<p> Suppose I see two individuals approaching whose rank I wish to ascertain. They are, we will suppose, a Merchant and a Physician, or in other words, an Equilateral Triangle and a Pentagon: how am I to distinguish them?</p> <p><gap desc='geometric figure' reason='editorial policy' extent='ca. 14 lines'></p> <p>It will be obvious ... </p>
Notes in the text, whether footnotes, endnotes, or inline block notes, should be tagged with the note element. The location may be given, if desired, in the place attribute. Authorial notes may be distinguished from editorial notes by means of the resp attribute, which indicates who is responsible for the note. For example:
<p>IN WITNESS WHEREOF, I have hereunto set my hand and caused the seal of the United States to be affixed.</p> <note resp='ed' place=inline><p>The 72d Congress, which convened following the 1932 election, passed the Twenty-first Amendment to the Constitution to repeal the Eighteenth Amendment.</p> <p> ... </p> </note>
Footnotes and endnotes should normally be transcribed at their point of attachment. Their number may optionally be given in the n attribute:
... have consecrated it far above our power<note place='foot' n=21> Philadelphia <title>Inquirer</title> has <q>our poor attempts</q> and Chicago <title level='J'>Tribune</title> has <q>our poor power.</q></note> to add or detract.
Lists should be tagged using the list and item elements; a heading or title for the list should be tagged as a head. Lists may be distinguished as ordered (numbered), unordered (bulleted), etc., by means of the type attribute. For example:
The President shall proclaim the date of <list type=ordered> <item n='(1)'>the close of the first fiscal year ending June 30 of any year after the year 1933, in which ..., or</item> <item n='(2)'>the repeal of the eighteenth amendment to the Constitution, </item> </list> whichever is the earlier.
The full TEI scheme also defines a label element for use as an alternative to using the n attribute to give item numbers or labels.
Notes in the preceding sections have mentioned some of the elements defined in the full TEI scheme's core tag set but omitted from this bare-bones version. In addition to those already mentioned, tags omitted here include those for proper nouns and other references to people and places, addresses, numbers, units of measure and measured quantities, dates, and times of day.
The full scheme also defines optional tag sets for hypertext linking, analysis or interpretation (including both literary and linguistic analysis) of the text, manuscript transcription, text-critical apparatus, tables, figures, and other specialized interests.