Bare Bones TEI

A Very Very Small Subset

of the TEI Encoding Scheme


C. M. Sperberg-McQueen

Document No. TEI U6
30 Aug 1994, rev. June 1995

Table of Contents

  • 1 Preface

    An SGML version of this document may be retrieved from http://www.uic.edu/orgs/tei/intros/teiu6.sgm. An HTML version of this document split into multiple files (for faster retrieval) may be found at http://www.uic.edu/orgs/tei/intros/teiu6.split.html.


    1 Preface


    Mark Olsen
    ARTFL Database
    University of Chicago
    1050 E. 59th St.
    Chicago, Illinois 60637


    Dear Mark,

    A few months ago, when the TEI published its Guidelines, and you saw the 1300 pages, and hefted the seven pounds, of the Guidelines for Electronic Text Encoding and Interchange, you wrote me, as you may remember, words to the effect: "Your bricks landed on my desk today. Is there a Cliff Notes version? A bare-bones TEI, without any of the optional stuff, just the absolute minimal TEI encoding scheme?"

    This is my attempt to provide you what you asked for --- but only half of my effort to provide you with what I think you really need. The half I don't think you were asking for, though other people have, is a sort of `Pocket-sized Guide to the TEI': a version of the TEI encoding scheme which is small enough to be understood without too much trouble, but large enough to do reasonably serious work with, and powerful enough to suffice for most people's work encoding electronic text, most of the time. Lou and I have discussed this at some length, and a `pocket-sized TEI' (aka `TEI Lite') is now documented in a little paper called "An Introduction to TEI Tagging" (document number TEI U5).

    You didn't ask for a TEI Lite, though: you asked for something even smaller and more austere: you asked me to isolate the absolute minimum set of TEI tags, without which it's difficult to imagine making any useful electronic text nowadays at all. That is what I have done in this document.

    Note, however, that what you have in your hands is emphatically not an attempt at a realistic markup scheme for real use in encoding new texts. It is a definition of a toy markup language: the absolute minimum is not necessarily a useful minimum. In particular, although this tag set may conceivably suffice for the translation of ARTFL texts and other pre-existing data into TEI form, still I think that when you set about creating new electronic texts, you would be crazy to limit yourself to the textual features listed here, and I hope that, despite your well publicized antipathy to any rational scheme of text markup, and despite the ample measure of craziness which your friends all know and treasure in you, you won't do such a silly thing.

    The tag set defined here is simple enough that you should be able to get familiar with it in half an hour, become proficient in it in an afternoon or so, and outgrow it completely in a day or a week or two. And it is a clean subset of the full TEI encoding scheme, so that when you do outgrow this bare-bones tag set, and start (as I hope) looking at TEI Lite and the full TEI markup language, you will already have a firm grasp of the basics of TEI encoding, and can easily integrate the additional tags into the mental framework you built while assimilating this bare-bones TEI scheme. In order to encourage you, and other readers who share your predilection for craziness, to move eventually to the full TEI markup scheme, I mention periodically in this outline the tags in the TEI header and the TEI core tag set which are not included here, so that you will know what you're missing.

    I have to confess that Lou is skeptical about the definition of this bare-bones TEI subset. Like me, he thinks that it won't be useful for serious encoding of real data, but he disagrees with my belief that it may nevertheless be useful to those encountering the TEI for the first time. I hope it will be useful, by (a) reducing the clutter so you can see the basic outlines of the TEI scheme more clearly --- the tags included here are the ones everyone is going to need to use --- and (b) demonstrating, by a reductio ad absurdum, how reducing a tag set to this size (it's about the same size as the first version of HTML) forces one to omit too much material which can be useful in the encoding of virtually any text, and which is absolutely essential for dealing rationally with some texts. Lou thinks I am dreaming; time will tell.

    So: here is the bare-bones TEI subset you asked for --- may you read it in good health, and may it prove useful in showing you how to translate your existing data into TEI form, and extending your existing software to handle TEI data. (N.B. maledictions will rain on your head if you implement support for this subset but not for the full TEI DTD. And what's more, you'll deserve every malediction in the book.) Use it to encode some simple exercises in SGML and TEI tagging. A few exercises should suffice to persuade you that you'll need a larger scheme (e.g. the full TEI scheme) for serious encoding of texts you hope anyone will work with. Use TEI Lite instead of standard TEI if you must, but don't limit yourself to the skeletal tag set (perhaps I should say, cadaverous tag set) sketched here. Even you aren't that crazy.


    Best regards,

    Michael


    2 Introduction

    This document describes a bare-bones tag set taken from the Guidelines for Electronic Text Encoding and Interchange published in 1994 by the Text Encoding Initiative (TEI). The tags described have been chosen to serve as a simple introduction to the full markup scheme described in the Guidelines; they may suffice in some cases for the creation of simple electronic texts, but serious work will require a larger selection of the TEI tags. The reader is encouraged to use this document as a first introduction to TEI tagging, and to progress, after reading this document and using its tag set for a while, to a study of other TEI documentation, either the document called "TEI Lite: An Introduction to TEI Tagging" (document number TEI U5), or the full text of the Guidelines themselves ( Guidelines for Electronic Text Encoding and Interchange, document number TEI P3).

    This document introduces the tags informally, with examples. As an incentive to learn the full TEI tag set, it mentions, from time to time, tags which are in the full tag set but have been omitted here to keep the bare-bones tag set simple. Such references may be ignored on first reading. Fuller discussion of all tags, and their formal descriptions in terms of the Standard Generalized Markup Language (SGML) may be found in the Guidelines.

    3 Bare-Bones SGML

    SGML, the Standard Generalized Markup Language, is a formal language for representing text in electronic form. The TEI tag set is defined in terms of SGML, and all TEI-conformant documents must also conform to SGML.

    In SGML-based encoding schemes, a document is represented by a combination of content (roughly speaking, the characters of the text, what you see on a printed page when the text is printed out) and markup (roughly speaking, information about the structure of the text, or features important for proper processing of the text, such as its division into chapters and sections, or the fact that a given phrase is a technical term and must be italicized). Non-SGML software, such as proprietary word processors, uses a similar division into content and markup. In sophisticated software, markup is usually invisible to the user unless you use a reveal-codes function or the like, to make it visible. SGML differs from proprietary markup systems in several ways:

    There are other differences, but these will do for now.

    SGML markup takes three forms: declarations, entity references and tags. [I cannot tell a lie: actually, there are four forms of markup. The fourth, processing instructions, won't concern us here.]

    Declarations are used to define the tags and entity references which are legal in a document type. Since the tags and entities we are concerned with here have all already been defined by the TEI, there is no need to discuss declarations further in this document. You will need to learn about them if you want to customize the TEI tag sets, but that won't be covered here. The only form of declaration you need to know about, to follow the examples below, is the comment, which is preceded by <!-- and followed by -->:

     
    <!-- this is a comment. -->
    <!-- this is a second comment. -->
    <!-- Comments are ignored by the SGML parser,
    and usually ignored by SGML software
    of all types.  As this comment shows,
    comments can go on for several lines. -->
    

    Entities are named portions of documents, which may be stored separately; entity references show where each entity goes. Among other things, entity references are used to embed special characters in the text when, as often happens, the characters in question are not available on the keyboard. Some entities for special characters are defined in international standards. For example, the entity eacute names the character "e with an acute accent" (é). When the standard entity sets are in use, the following two examples are identical in meaning:

     
    L'&eacute;tat, c'est moi.
    
     
    L'état, c'est moi.
    

    (In case this has been corrupted in transmission, or is being rendered on a device without accented characters, that second one is the same as the first, except that the reference to the entity eacute in characters 3-10 of the first example has been replaced with a real e with an acute accent in the system's native character set in character 3 of the second example.)

    Entities are also used to handle graphics and other material in non-SGML notations, and to divide a document up into sections stored in separate files for purposes of simpler maintenance, but we won't discuss such uses here.

    Tags mark the beginning and ending of parts of the document; the parts themselves are called elements. Normally, tags are marked in the document by angle brackets; end-tags have a slash after the opening angle bracket. In the following example, the sentence is marked as a quotation by the start-tag and end-tag which surround it; quote is an element type defined by the TEI.

     
    <quote>L'&eacute;tag, c'est moi.</quote>
    

    Elements always have a basic type (in the example above, it is quote); they may also have other attributes, which are indicated by special notations inside the start-tag for the element. For example, the TEI defines the attribute lang as applying to every type of element; its value indicates the language of the element's content, using standard two- or three-letter abbreviations (e.g. fra for "French").

     
    <quote lang='fra'>L'&eacute;tat, c'est
    moi.</quote>
    

    Some attributes may be restricted to certain types of values. Attributes of type id, for example, must provide a unique name or identifier for the element on which they appear; this identifier can be referred to by other attributes, of type idref ( id reference). The TEI defines a global attribute named id, of type id, for use in cross-references and other kinds of hypertext links. (TEI attributes are called global when they apply to every type of element.)

    Finally, it should be noted that SGML allows some tags to be omitted from documents, in cases when they are logically redundant and their location can be inferred from that of other tags; in the examples given here, we will not exploit this facility, but always give all tags explicitly. Tag omission is generally of interest only to those working without an SGML editor.

    In sum: in SGML, everything is delimited.

    That's all there is to it. If you understand the rules just described, you should have no trouble understanding all the SGML examples in this document.

    4 Basic Text Encoding

    A TEI-conformant electronic text consists of the text itself (transcribed from some source, or created in electronic form), preceded by a TEI header, which identifies the electronic text and can also document the encoding practices used in creating it. The entire thing is enclosed within a tei.2 element, and preceded by an SGML declaration identifying the document type to be used in validating the document.

    The SGML declaration won't be described here. Further below, I'll discuss the TEI header, and the specialized tags for front matter and back matter of the main text. In work with electronic text, however, the vast majority of one's time is spent within the body of the text itself, and so I begin with a description of tags for basic text encoding: paragraphs and other paragraph-like things, character- or phrase-level elements which occur within paragraphs, and so on.

    4.1 Paragraphs

    Mark paragraphs with the tag p. Paragraphs do not nest, and neither may p elements. For example:

     
    <p>I call specific attention to
    the authority given by the 21st Amendment
    to the Constitution to prohibit transportation
    or importation of intoxicating liquors into
    any State in violation of the laws of such
    State.</p>
    <p>I ask the wholehearted cooperation of all our
    citizens to the end that this return of individual
    freedom shall not be accompanied by the repugnant
    conditions that obtained prior to the adoption of
    the 18th Amendment and those that have existed
    since its adoption.  Failure to do this honestly
    and courageously will be a living reproach to us
    all.</p>
    <p>I ask especially that no State shall by law
    or otherwise authorize the return of the saloon
    either in its old form or in some modern guise.
    </p>
    
    [This example, like most of the others not otherwise identified, is from Franklin D. Roosevelt's proclamation upon the repeal of Prohibition, in The Public Papers and Addresses of Franklin D. Roosevelt, vol. II (New York: Random House, 1938), pp. 510-514.]

    4.2 Highlighted Phrases

    Phrases which are highlighted in the source (or should be highlighted in the output), whether by italics, boldface, small caps, or other special treatment, should be tagged with the hi element. The rend attribute may optionally say how the phrase was highlighted. In the example below, the word whereas and the phrase therefore, I, Franklin D. Roosevelt are printed in small caps in the source:

     
    <p><hi rend='sc'>Whereas</hi> the
    Congress of the United States ... </p>
    <p><hi rend='sc'>Whereas</hi> Section 217(a) of
    the Act of Congress entitled "An Act ..." ...</p>
    <p><hi rend='sc'>Whereas</hi> it appears ...
    </p>
    <p>Now, <hi rend='sc'>therefore, I, Franklin
    D. Roosevelt</hi>, President of the United
    States of America ... do hereby proclaim that
    the Eighteenth Amendment to the Constitution of
    the United States was repealed on the fifth
    day of December, 1933.</p>
    

    The rend attribute may be omitted if the rendering is of no interest, or if all highlighted phrases are rendered the same way. Its values may be chosen arbitrarily by the encoder --- the values used may then be used in turn to direct processing software to display or process the element correctly.

    [It is normally preferable to mark phrases with element types indicating why they are highlighted, rather than simply indicating that they are highlighted. The full TEI encoding scheme defines elements which allow typographic highlighting to be identified as marking linguistic emphasis (emph), words in foreign languages (foreign), words in non-standard or specialized languages (distinct), technical terms (term), glosses on terms (gloss), and words mentioned rather than used (mentioned). The generic hi element is normally used only when it is economically or intellectually infeasible to supply one of the more informative alternatives.]

    4.3 Quotations

    Mark quotations from other works, or dialog spoken by characters in a narrative, as q (quotation) elements:

     
    <p><hi rend='sc'>Whereas</hi>
    Section 217(a) of the Act of Congress
    entitled "An Act ..." approved June 16,
    1933, provides as follows:
    <q>Section 217(a) The President shall
    proclaim the ... </q></p>
    

    Block quotations and inline quotations are distinguished only by the value of their rend attribute; for the former, use the value "block" or "display", for the latter, use "inline".

    [The full TEI scheme also provides a quote element which is restricted to real quotations from external sources, and unlike q may not be used for direct discourse and fictive quotations. Also provided there but missing here are cit, for quotations with attached bibliographic references to their sources, and soCalled, for material printed with `scare quotes' to indicate that the author disclaims full responsibility for it.]

    4.4 Cross References

    References to other documents, or to other locations in the current document, should be tagged with the ref tag:

     
    WHEREAS <ref>Section 217(a) of
    the Act of Congress ... approved June
    16, 1933</ref>, provides as follows: ...
    
    [The full scheme defines an empty element called ptr for use when the actual phrase referring to the other document or section can be generated automatically by software, as is usually done in document production systems.]

    For cross references within the same SGML document, the target attribute may be used to indicate which section is being referred to; its value is the id value assigned to some element in the document. For example, the following cross reference:

     
    I there expressed the hope,
    and asked for united cooperation, that
    this return of individual freedom would
    not be accompanied by anti-social
    conditions, such as the saloon and the
    other evils of the pre-prohibition era.
    (See also <ref target='pc1993-10-11'>Press
    Conference of October 11, 1933, Item 137,
    this volume</ref>.)
    

    assumes the existence of some element elsewhere in the volume with the identifier given:

     
    <div id='pc1933-10-11'>
    <head>Press Conference, 11 October 1933</head>
    <!-- ... -->
    </div>
    
    [This example is from the note in the Public Papers which follows the proclamation of the repeal of Prohibition.]

    The div and head used in the example just given elements are described below.

    4.5 Page Breaks

    If the page breaks of the source are of interest, as they generally are for material transcribed from existing printed editions, record them using the pb element. This element is empty: that is, it has neither content nor an end-tag. It does not mark a passage or portion of the text, just a location within the text. The attribute n, defined for all TEI elements, should be used to indicate the page number; if page numbers from more than one edition are transcribed, the attribute ed should be used to distinguish the two paginations:

     
    <p>I ask the wholehearted cooperation
    of all our citizens to the end that
    this return of individual freedom shall
    not be accompanied by the repugnant
    conditions that obtained prior to the
    <pb n='512' ed='1938'>
    adoption of
    the 18th Amendment and those that
    have existed  since its adoption....</p>
    
    [In addition to page breaks, column and line breaks may be of interest; the full TEI scheme defines cb and lb elements for these, as well as a generic milestone element for boundaries and breaks of unforeseen type. Specialized tags in the TEI header can describe how these milestone elements are used in standard reference schemes for the work.]

    4.6 Verse

    Individual verse lines should be tagged with l (that's an "L"), stanzas or other verse structures above the level of the line should be tagged lg ( line group); the latter's type attribute may optionally be used to identify the formal structure in question, for retrieval or other purposes:

     
    <lg type='quatrain'>
    <l>Awake! for Morning in the Bowl of Night</l>
    <l>Has flung the Stone that puts the Stars to Flight:</l>
    <l>And Lo! the Hunter of the East
    has caught</l>
    <l>The Sultan's Turret in a Noose of Light.</l>
    </lg>
    
    [Example is from Rubáiyát of Omar Khayyám, tr. Edward Fitzgerald (New York: Collier; London: Collier-Macmillan, 1962), first quatrain of the first edition.]

    When the indentation of the lines is significant, it can be recorded using the global rend attribute, with some suitable value:

     
    <l rend='indent'>And Lo! the Hunter
    of the East has caught</l>
    <l>The Sultan's Turret in a Noose of Light.</l>
    

    Of course, if the verse is quoted from another text, the l elements should be enclosed in a q element.

    4.7 Drama

    Drama should be encoded with the elements sp ( speech) and stage ( stage direction). Stage directions can occur either within speeches or between them. As may be seen in the example below, the speaker may be indicated with the who attribute on the sp element:

     
    <sp who='Casca'>
    <l>Speak, hands, for me!</l></sp>
    <stage>They stab Caesar.</stage>
    <sp who='Julius Caesar'>
    <l>Et tu, Brute? -- then fall, Caesar!</l>
    <stage>Dies.</stage></sp>
    
    [Example is from a modern student reprint of Julius Caesar, III.i: William Shakespeare, The Tragedy of Julius Caesar (New York: Airmont, 1965).]

    When the precise form of the speaker atribution in the source is important, the speaker may be identified by a separate speaker element at the beginning of the sp element.

     
    <sp><speaker>Cas.</speaker>
    <l>Speak, hands, for me!</l></sp>
    <stage>They stab Caesar.</stage>
    <sp><speaker>Caes.</speaker>
    <l>Et tu, Brute? -- then fall, Caesar!</l>
    <stage>Dies.</stage></sp>
    

    These tags may also be used for material not written as drama, but presented using dramatic conventions (e.g. transcriptions of speeches, or of press conferences):

     
    The brave men living and dead
    who struggled here have consecrated it
    far above our power to add or detract.
    <stage>[Applause.]</stage>
    <!-- ... -->
    and that Governments of the people,
    by the people, and for the people,
    shall not perish from the earth.
    <stage>[Long-continued applause.]
    </stage>
    
    [Newspaper version of Abraham Lincoln, "Address Delivered at the Dedication of the Cemetery at Gettysburg," in The Collected Works of Abraham Lincoln, ed.Roy P. Basler, vol. VII (New Brunswick: Rutgers University Press, 1953), pp. 20-21. Since in this text such stage-directions are always printed in brackets, the encoder might choose to omit the square brackets from the transcription, noting in the header that stage elements are always bracketed.]

    As with verse, if the drama is quoted from another text, it should be enclosed in a q element.

    4.8 Bibliographic References

    Bibliographic references should normally be enclosed in bibl elements; within such elements, or outside them, title may be used to mark titles of articles, books, journals, etc. Its level attribute takes the values A, M, J, S, or U to show whether the title is an analytic (article) title, a monogrphic (book) title, the title of a journal, that of a series, or that of unpublished material such as a thesis. For example a reference to: "Inaugural Address, March 4, 1933," in The Public Papers and Addresses of Franklin D. Roosevelt, vol. II (New York: Random House, 1938), pp. 510-514 would be encoded thus:

     
    <bibl>
    <title level='A'>Inaugural Address,
    March 4, 1933</title>, in
    <title level='M'>The Public Papers and
    Addresses of Franklin D. Roosevelt
    </title>, vol. II
    (New York:  Random House, 1938), pp. 11-16.
    </bibl>
    
    [Omitted from this bare-bones tag set are tags for other bibliographic elements, such as author, editor, publisher, and so on. Also omitted are the elements biblStruct and biblFull, which require consistently structured bibliographic entries and are useful when all the items in a bibliography must be structured correctly (e.g., for machine processing).]

    4.9 Omissions

    If material has been omitted from an electronic text (e.g. because it is illegible or not of interest to the expected users, the omission should normally be indicated using a gap element at the point of omission. The attributes desc, reason, and extent may optionally be used to describe what was omitted, to explain why, and to give an approximate size for it. For example:

     
    <p>
    Suppose I see two individuals approaching
    whose rank I wish to ascertain.  They are,
    we will suppose, a Merchant and a Physician,
    or in other words, an Equilateral Triangle
    and a Pentagon:  how am I to distinguish
    them?</p>
    <p><gap desc='geometric figure'
    reason='editorial policy'
    extent='ca. 14 lines'></p>
    <p>It will be obvious ... </p>
    
    [Example is from Edwin A. Abbott, Flatland: A Romance of Many Dimensions (1884; rpt. New York: Dover, 1992), p. 19, extract from chapter 6, "Recognition by Sight."] [The bare-bones tag set omits the elements defined by the standard TEI tag set for marking other kinds of editorial interventions or authorial alterations to a text, such as cancellations, insertions, corrections or failure to correct errors, normalized spelling, illegible writing or inaudible speech, and the expansion of abbreviations. ]

    4.10 Notes

    Notes in the text, whether footnotes, endnotes, or inline block notes, should be tagged with the note element. The location may be given, if desired, in the place attribute. Authorial notes may be distinguished from editorial notes by means of the resp attribute, which indicates who is responsible for the note. For example:

     
    <p>IN WITNESS WHEREOF,
    I have hereunto set my hand and caused
    the seal of the United States to be
    affixed.</p>
    <note resp='ed' place=inline><p>The 72d
    Congress, which
    convened following the 1932 election,
    passed the Twenty-first Amendment to the
    Constitution to repeal the Eighteenth
    Amendment.</p>
    <p> ... </p>
    </note>
    

    Footnotes and endnotes should normally be transcribed at their point of attachment. Their number may optionally be given in the n attribute:

     
    ... have consecrated it
    far above our power<note place='foot' n=21>
    Philadelphia <title>Inquirer</title> has
    <q>our poor attempts</q> and Chicago
    <title level='J'>Tribune</title> has
    <q>our poor power.</q></note>
    to add or detract.
    

    4.11 Lists

    Lists should be tagged using the list and item elements; a heading or title for the list should be tagged as a head. Lists may be distinguished as ordered (numbered), unordered (bulleted), etc., by means of the type attribute. For example:

     
    The President shall proclaim
    the date of
    <list type=ordered>
    <item n='(1)'>the close of the first fiscal
    year ending June 30 of any year after the
    year 1933, in which ..., or</item>
    <item n='(2)'>the repeal of the
    eighteenth amendment to the Constitution,
    </item>
    </list>
    whichever is the earlier.
    

    The full TEI scheme also defines a label element for use as an alternative to using the n attribute to give item numbers or labels.

    4.12 What Is Missing?

    Notes in the preceding sections have mentioned some of the elements defined in the full TEI scheme's core tag set but omitted from this bare-bones version. In addition to those already mentioned, tags omitted here include those for proper nouns and other references to people and places, addresses, numbers, units of measure and measured quantities, dates, and times of day.

    The full scheme also defines optional tag sets for hypertext linking, analysis or interpretation (including both literary and linguistic analysis) of the text, manuscript transcription, text-critical apparatus, tables, figures, and other specialized interests.

    5 Overall Structure of a Text

    5.1 Front, Body, and Back Matter

    Overall, texts are divided into front matter, the body, and back matter, tagged respectively front, body, and back. Front and back matter are distinct only by virtue of their location: they can contain exactly the same kinds of material. The overall structure of a typical book, for example, would be something like this:

     
    <text>
    <front> <!-- front matter here:  title
    page, dedication, preface, etc. ... -->
    </front>
    <body>  <!-- main body of edition here ... -->
    </body>
    <back>  <!-- back matter here:  index,
    bibliography, etc.... -->
    </back>
    </text>
    

    5.2 Text Divisions

    Within the body, or within the front and back matter, text may be subdivided into text divisions (parts, chapters, sections; act, scene; canto, stanza; etc.). For such divisions, the single element div should be used; subsections are tagged with nested div elements. The type attribute may be used to indicate that the division has a particular name or type; later divisions will take the same type value unless a different value is specified. Within a text division, paragraphs or paragraph-level elements (e.g. note, list) may occur.

     
    <div type='Section' n=1>
    <p>The eighteenth article of amendment
    to the Constitution of the United States
    is hereby repealed.</p></div>
    <div n=2><p>The transportation or importation
    into any State, Territory, or possession of
    the United States for delivery or use
    therein of intoxicating liquors, in
    violation of the laws thereof, is hereby
    prohibited.</p></div>
    <div n=3><p>This article shall be inoperative
    unless it shall have been ratified as an
    amendment to the Constitution by conventions
    in the several States, as provided in the
    Constitution, within seven years from
    the date of the submission hereof to the
    States by the Congress.</p></div>
    

    In cases where text divisions have no headings, or have only headings consisting of their type value and a number, no heading need be given, as shown above. If desired, however, the heading may be given explicitly:

     
    <div type='Section' n=1>
    <head>Section 1.</head>
    <p>The eighteenth article of amendment
    to the Constitution of the United States
    is hereby repealed.</p></div>
    <div n=2><head>Section 2.</head>
    <p>The transportation ...</p></div>
    <div n=3><head>Section 3.</head>
    <p>This article shall be inoperative
    unless ...</p></div>
    

    The headings in the preceding example are fixed text (the word Section followed by the value of the n attribute), which any moderately intelligent SGML software could generate mechanically. In general, document management is more convenient, and results are more consistent, if such material is not transcribed as part of the text, but is generated by software when the text is displayed or printed. Inconsistency in the source, of course, may be of interest, and if so it should be captured explicitly.

    [The full TEI encoding scheme includes specialized elements for anthologies (texts containing other texts), epigraphs, datelines, bylines, salutations, signatures, and groups of headings, datelines, etc. at the beginning or ending of a text division.]

    5.3 Title Pages

    The TEI encoding scheme defines specialized tags for transcribing title pages, in order to ensure that processing software can easily locate and identify the author, title, and date of the document as given on its title page. The title page itself, and its major component parts, are illustrated in this example:

     
    <titlePage>
    <docTitle>
    <titlePart type='main'>The Public Papers
    and Addresses of
    Franklin D. Roosevelt</titlePart>
    <titlePart type='sub'>With a special introduction
    and explanatory notes by
    President Roosevelt</titlePart>
    <titlePart type='vol number'>Volume Two</titlePart>
    <titlePart type='vol title'>
    The Year of Crisis
    1933</titlePart>
    </docTitle>
    <docImprint>
      <publisher>Random House</publisher>
      <pubPlace>New York</pubPlace>
      <docDate>1938</docDate>
    </docImprint>
    </titlePage>
    

    The titlePart element is used both for the different parts of the document title (as shown) and also for miscellaneous parts of the title page which are neither document title, nor document author, nor imprint information.

    [In addition to the tags shown here, the full TEI scheme defines a docEdition element for tagging information like "second revised and expanded edition".]

    6 The TEI Header

    The TEI header allows later users of the etexts you create to find out what the text is, who created the etext (i.e. you), and what source edition(s) you transcribed the etext from. In its full expansion, it also allows a full accounting of your transcription practice (did you correct typos silently? did you expand abbreviations? normalize spelling? etc.) and can also include a detailed characterization of the text itself (demographics of its author and audience, subject matter, genre, etc.) and a full change log, which is important for document management in large projects.

    For bare-bones work, however, it's simplest to copy the following TEI header by rote, and replace the text in square brackets with appropriate information about the text being encoded. If the etext is not transcribed from a pre-existing source, but instead is being created in electronic form, the bibl tags within the sourceDesc element should be changed to p.

     
    <teiHeader>
    <fileDesc><titleStmt><title>
      [Put the title of the electronic text here.]
    </title><publicationStmt><p>
      [Indicate who is publishing this electronic text (i.e. you).]
    </p></publicationStmt><sourceDesc><bibl>
      [Indicate the source from which this etext is transcribed.]
    </bibl></sourceDesc></fileDesc></teiHeader>
    

    For example, the TEI header of the document you are reading looks like this:

     
    <teiHeader>
    <fileDesc><titleStmt><title>
    Bare Bones TEI:  A Very Very Small Subset of the TEI Encoding Scheme
    </title><publicationStmt><p>
    Published electronically by the Text Encoding Initiative, Chicago and
    Oxford, in 1994.
    </p></publicationStmt><sourceDesc><p>
    This text was created in electronic form.
    </p></sourceDesc></fileDesc>
    </teiHeader>
    

    6.1 What You're Missing

    [Not described here are facilities in the TEI header for These facilities are all present in the full header; they may not all be defined in the TEI Lite tag set.]

    7 Putting It All Together

    A TEI-encoded electronic text is always encoded as a tei.2 element, which in turn contains a teiHeader element followed by a text element. The overall structure is thus:

     
    <tei.2>
    <teiHeader> <!-- TEI header information ... -->
    </teiHeader>
    <text>
      <front> <!-- ... --> </front>
      <body>  <!-- ... --> </body>
      <back>  <!-- ... --> </back>
    </text>
    </tei.2>
    

    The start-tag of the tei.2 element is preceded by an explicit reference to the external file containing the document type definition to be applied to the text by the SGML parser. The stripped-down DTD described here may be invoked with the following document-type declaration:

     
    <!DOCTYPE tei.2 SYSTEM 'barebone.dtd'>
    

    In some systems, the association of a document with a given document type is handled internally, and no such explicit declaration is visible until the document is `exported' from the system. In such systems, the user will be asked to select a `rules' or `logic' file when the document is first created or imported into the editor.


    8 A Complete Example

    The following is a small but complete document encoded using the tag set declared here:

     
    <tei.2>
    <teiHeader>
    <fileDesc><titleStmt><title>
      Bare-bones Sample of Bare-bones Tagging
    </title><publicationStmt><p>
      An unpublished document.
    </p></publicationStmt><sourceDesc><p>
      This document created in electronic form.
    </p></sourceDesc></fileDesc></teiHeader>
     
    <text><body>
    <p>The world's shortest TEI document.</p>
    </body></text>
     
    </tei.2>
    

    9 A More Interesting Example

    A slightly more realistic example of bare-bones tagging is provided by the following abridged transcription of Franklin D. Roosevelt's proclamation that Prohibition (i.e. the prohibition of alcohol, imposed in the U.S. by the adoption of the 18th Amendment to the Constitution) had been repealed. In the following example, the overall structure is what would be used if the entire Public Papers of Roosevelt, or a selection of several of them, were being transcribed.

     
    <tei.2>
    

    The header identifies the electronic text and gives the source from which it was made.

     
    <teiHeader>
    <fileDesc><titleStmt><title>
      Proclamation of the 21st Amendment:
      an Electronic Version
    </title></titleStmt>
    <publicationStmt>
    <p>Published by the TEI as a specimen of tagged
    text.</p></publicationStmt> <sourceDesc><bibl> <title level='M'>The
    Public Papers and Addresses of Franklin D. Roosevelt </title>, vol. II
    (New York:  Random House, 1938).  <!-- here we transcribe only
         <title level='A'>The President Proclaims
         the Repeal of the Eighteenth Amendment.
         Proclamation No. 2065.
         December 5, 1933</title>, pp. 510-514.  --> </bibl>
    </sourceDesc></fileDesc></teiHeader>
    

    The text element contains the actual transcription.

     
    <text><front><titlePage>
    <docTitle>
    <titlePart type='main'>
    The Public Papers and Addresses of Franklin D. Roosevelt</titlePart>
    <titlePart type='sub'>
    With a special introduction and explanatory notes by President
    Roosevelt</titlePart>
    <titlePart type='vol number'>Volume Two</titlePart>
    <titlePart type='vol title'>The Year of Crisis 1933</titlePart>
    </docTitle>
    <docImprint>
      <publisher>Random House</publisher>
      <pubPlace>New York</pubPlace>
      <docDate>1938</docDate>
    </docImprint>
    </titlePage>
    <div type='copyright page'> <!-- ... --> </div>
    <div type='notice'> <!-- ... --> </div>
    <div type='table of contents'> <!-- ... --> </div>
    </front>
    

    The body of the electronic text is a series of documents, each in a div element.

     
    <body>
    <div n=1 type='speech'>
    <head>Inaugural Address.</head>
    <head type='date'>March 4, 1933</head>
    <!-- ... -->
    </div>
     
    <div n=2 type='Proclamation'>
    <head>The President Calls the Congress
    into Extraordinary Session.</head>
    <head type='docno'>Proclamation No. 2038.</head>
    <head type='date'>March 5, 1933</head>
    <!-- ... -->
    </div>
     
    <!-- ... etc. -->
    

    The repeal of the 18th Amendment is item no. 175 in this volume.

     
    <div n=175 type='Proclamation'>
    <head>The President Proclaims the Repeal
    of the Eighteenth Amendment.</head>
    <head type='docno'>Proclamation No. 2065.</head>
    <head type='date'>December 5, 1933</head>
     
    <p><hi rend='sc'>Whereas</hi> the
    Congress of the United States in 2d Session of the 72d Congress, begun
    at Washington on the fifth day of December in the year one thousand nine
    hundred and thirty-two, adopted a resolution in the words and figures
    following, to wit &mdash;</p>
    

    At this point the Congressional resolution is quoted in its entirety. It has its own title and paragraphing, and embeds in its turn the full text of yet another document, which became the 21st Amendment. Since FDR is quoting the resolution, we tag it as a q. Within the q is a text element. The q is rendered as a block quote with quotation marks at the beginning and end, and opening quotation marks at the beginning of each paragraph.

     
    <q rend='display, quoted paras'><text><body> <head rend='caps'>Joint
    Resolution</head> <head type='sub'>Proposing an amendment to the
    Constitution of the United States.</head>
    <p>Resolved by the Senate and House of
    Representatives of the United States of America
    in Congress assembled (two-thirds of each House
    concurring therein), That the following article
    is hereby proposed as an amendment to the
    Constitution of the United States, which shall
    be valid to all intents and purposes as part of
    the Constitution when ratified by conventions
    in three-fourths of the several States:
    

    The beginning of the embedded text of the amendment here:

     
    <q><text><body><head rend='caps'>Article</head>
    <div type='Section' n=1>
    <p>The eighteenth article of amendment
    to the Constitution of the United States
    is hereby repealed.</p></div>
    <div n=2><p>The transportation or importation
    into any State, Territory, or possession of
    the United States for delivery or use
    therein of intoxicating liquors, in
    violation of the laws thereof, is hereby
    prohibited.</p></div>
    <div n=3><p>This article shall be inoperative
    unless it shall have been ratified as an
    amendment to the Constitution by conventions
    in the several States, as provided in the
    Constitution, within seven years from
    the date of the submission hereof to the
    States by the Congress.</p></div>
    </body></text>
    </q>
    

    The end of the embedded text of amendment here.

     
    </body></text>
    </q>
    

    And here, the end of the quoted Congressional resolution.

     
    <p><hi rend='sc'>Whereas</hi> Section 217(a) of
    the Act of Congress entitled <title>An Act
    to encourage national industrial recovery,
    to foster competition, and to provide for
    the construction of certain useful
    public works, and for other purposes</title>
    approved June 16, 1933, provides as follows:
    

    Here we have a quotation within a paragraph, which itself contains a paragraph with an embedded list.

     
    <q><p>Section 217(a) The President
    shall proclaim the date of
    <list type=ordered>
    <item n='(1)'>the close of the first fiscal
    year ending June 30 of any year after the
    year 1933, during which the total receipts
    of the United States (excluding
    public-debt receipts)exceed its total
    expenditures (excluding public-debt
    expenditures other than those
    chargeable against such receipts),
    or</item>
    <item n='(2)'>the repeal of the
    eighteenth amendment to the Constitution,
    </item>
    </list>
    whichever is the earlier.</p>
    </q></p>
     
    <p><hi rend='sc'>Whereas</hi> it appears from
    a certificate issued December 5, 1933, by the
    Acting Secretary of State that official notices
    have been received by the Department of State
    that on the fifth day of December, 1933,
    Conventions in thirty-six States of the
    United States, constituting three-fourths of
    the whole number of the States had ratified the
    said repeal amendment:</p>
     
    <p>Now, <hi rend='sc'>therefore, I, Franklin
    D. Roosevelt</hi>, President of the United
    States of America pursuant to the provisions
    of Section 217(a) of the said Act of June 16,
    1933, do hereby proclaim that
    the Eighteenth Amendment to the Constitution of
    the United States was repealed on the fifth
    day of December, 1933.</p>
     
    <p><hi rend='sc'>Furthermore</hi>, I enjoin
    upon all citizens of the United States and
    upon others resident within the jurisdiction
    thereof, to co-operate with the Government
    in its endeavor to restore greater respect
    for law and order, by confining such purchases
    of alcoholic beverages as they may make
    solely to those dealers or agencies which
    have been duly licensed by State or Federal
    license.</p>
     
    <!-- ... -->
     
    <p>I call specific attention to
    the authority given by the 21st Amendment
    to the Constitution to prohibit transportation
    or importation of intoxicating liquors into
    any State in violation of the laws of such
    State.</p>
    <p>I ask the wholehearted cooperation of all our
    citizens to the end that this return of individual
    freedom shall not be accompanied by the repugnant
    conditions that obtained prior to the adoption of
    the 18th Amendment and those that have existed
    since its adoption.  Failure to do this honestly
    and courageously will be a living reproach to us
    all.</p>
    <p>I ask especially that no State shall by law
    or otherwise authorize the return of the saloon
    either in its old form or in some modern guise.
    </p>
     
    <!-- ... -->
     
    <p><hi rend='sc'>In witness whereof</hi>,
    I have hereunto set my hand and caused
    the seal of the United States to be
    affixed.</p>
     
    <note resp='ed' place=inline><p>The 72d
    Congress, which
    convened following the 1932 election,
    passed the Twenty-first Amendment to the
    Constitution to repeal the Eighteenth
    Amendment.</p>
    <p> <!-- ... --> </p>
    </note>
     
    </div>
    

    Here is the end of the repeal proclamation. From here, the transcription continues in the same way, to the end of the volume.

     
    <!-- ... -->
    </body></text>
    </tei.2>
    

    10 What About Software?

    Documents created using the tag set described here can be created:

    Of these, the first and third are most convenient for most users, and the first and second are most likely to produce valid SGML. The main difficulty with the third method is that mechanical translation from a word processor into SGML is usually possible only for very restricted SGML tag sets, and is only reliable if the documents have been created with an unusually disciplined use of the word processor's style-sheet facility. Any user interested enough in SGML to exercise the necessary discipline would probably do better with a full-fledged SGML editor.

    Once created, SGML documents can be processed with a variety of commercial and public-domain tools. No complete listing is possible here; at the time this is written, the most convenient summary of SGML software is the Whirlwind Guide to SGML Tools maintained by Steve Pepper of Oslo, and available on the internet by ftp at ftp.uio.no (if you don't know about ftp, or this whole paragraph appears to be technobabble, consult your local computer center, or one of the numerous recent guides to the Internet for users who lack local computer center support). The most popular public-domain tool is the parser sgmls, written by James Clark on the basis of materials written by Charles Goldfarb. Using sgmls to process SGML documents commonly involves writing programs to read its standard output format, but it can also be used by non-programmers to check the validity of their SGML documents. (If you want to do this, check the TEI file servers for DOS batch files, Unix shell scripts, or the equivalent for your system, which simplify the task of setting up sgmls and running it as a validator. If you run into difficulties, issue a call for help on TEI-L.) An increasing number of SGML tools also use sgmls as a pre-processor, so acquiring a copy of sgmls makes sense even for those who have no intention of writing programs on their own.

    11 Summary of the Bare Bones TEI Subset

    11.1 Elements in the Bare Bones Tag Set

    The tags included in the Bare Bones TEI Subset are:

    11.2 Formal Declarations

    The bare-bones TEI subset is a clean subset of the TEI encoding scheme as published: bare-bones texts conform to the published TEI DTD. The subset is defined exclusively by suppressing elements which are normally available within TEI documents. This suppression is accomplished by a DTD fragment available from the TEI file servers under the name bb.ent (for "bare-bones entities").