4 Default Text Structure

Inhalt

This chapter describes the default high-level structure for TEI documents. A full TEI document combines metadata describing it, represented by a teiHeader element, with the document itself, represented by a text element. This basic pair is represented by a TEI element. The teiHeader element is specified by the header module, which is fully described in chapter 2 The TEI Header. The remainder of the present chapter describes the text element and its high-level constituents.

A variant on this basic form, the teiCorpus, is also defined for the representation of language corpora, or other collections of encoded texts. A teiCorpus consists of one or more complete TEI elements, each combining a teiHeader and a text which itself carries a teiHeader. This permits the encoder to distinguish metadata applicable to the whole collection of encoded texts, which is represented by the outermost teiHeader, from that applicable to each of the individual TEI elements within the corpus. Further information about the organization and encoding of language corpora is given in chapter 15 Language Corpora.

In summary, when the default structure module is included in a schema, the following elements are available for the representation of the outermost structure of a TEI document:

TEI (TEI-Dokument) enthält ein einzelnes TEI-konformes Dokument, das aus TEI-Header (Dateikopf) und Text besteht, entweder als eigenständige Datei oder als Teil eines Elements teiCorpus.
version Version des TEI-Schemas
teiCorpus contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more TEI elements, each containing a single text header and a text.
teiHeader (TEI-Header (elektronische Titelseite)) Beschreibungen und Erklärungen, die eine elektronische Titelseite ergeben, die jedem TEI-konformen Text vorangestellt ist.
text enthält eigenständigen oder aus mehreren Teilen bestehenden Text, zum Beispiel ein Gedicht oder Drama, eine Sammlung von Aufsätzen, einen Roman, ein Wörterbuch oder eine Auswahl aus einem Korpus

As noted above, the teiHeader element is formally declared in the header module (see chapter 2 The TEI Header). A TEI document may also contain elements from the model.resourceLike class (such as a collection of facsimile images, or a feature system declaration) if the appropriate module is included in a schema (see further 11.1 Digital Facsimiles and 18.11 Feature System Declaration respectively). By default, however, this class is not populated and hence only the elements TEI, text, and teiCorpus are available as major parts of a TEI document. These three elements are provided by the textstructure module described by the present chapter.

TEI texts may be regarded either as unitary, that is, forming an organic whole, or as composite, that is, consisting of several components which are in some important sense independent of each other. The distinction is not always entirely obvious: for example a collection of essays might be regarded as a single item in some circumstances, or as a number of distinct items in others. In such borderline cases, the encoder must choose whether to treat the text as unitary or composite; each may have advantages and disadvantages in a given situation.

Whether unitary or composite, the text is marked with the text tag and may contain front matter, a text body, and back matter. In unitary texts, the text body is tagged body; in composite texts, where the text body consists of a series of subordinate texts or groups, it is tagged group. The overall structure of any text, unitary or composite, is thus defined by the following elements:

front ( Vorspann (front)) enthält alle dem Kerntext vorangestellten Texte (Überschriften, Titelseite, Vorworte, Widmungen, usw.) zu Beginn eines Dokuments.
body ( Kerntext (text body) ) enthält den gesamten, eigenständigen Text, außer Vorspann (front) und Nachspann (back).
group enthält den Kerntext eines aus mehreren Einzeltexten bestehenden Textes, (oder eine Reihe solcher Texte), die zusammen als Einheit gesehen werden, zum Beispiel die gesammelten Werke eines Autors, eine Reihe von Prosastücken etc.
back ( Nachspann (back)) enthält Anhänge jeglicher Art, die auf den Hauptteil eines Textes folgen

The overall structure of a unitary text is:

The overall structure of a composite text made up of two unitary texts is:

Finally, a floatingText element is provided for the case where one text is embedded within another, but does not contribute to its hierarchical organization, for example because it interrupts it, or simply quoted within it. This is useful in such common literary contexts as the ‘play within a play’ or the narrative interrupted by other (often deeply nested) multiple narratives.

Each of these elements is further described in the remainder of this chapter. Elements front and back are further discussed in sections 4.5 Front Matter and 4.7 Back Matter. The group and floatingText elements, used for more complex or composite text structures, are further discussed in section 4.3 Grouped and Floating Texts. Other textual elements, such as paragraphs, lists or phrases, which nest within these major structural elements, are discussed in chapter 3 Elements Available in All TEI Documents, in the case of elements which can appear in any kind of document, or elsewhere in the case of elements specific to particular kinds of document.

4.1 Divisions of the Body Divisions of the Body¶

In some texts, the body consists simply of a sequence of low-level structural items, referred to here as components or component-level elements (see section 1.3 The TEI Class System). Examples in prose texts include paragraphs or lists; in dramatic texts, speeches and stage directions; in dictionaries, dictionary entries. In other cases sequences of such elements will be grouped together hierarchically into textual divisions and subdivisions, such as chapters or sections. The names used for these structural subdivisions of texts vary with the genre and period of the text, or even at the whim of the author, editor, or publisher. For example, a major subdivision of an epic or of the Bible is generally called a ‘book’, that of a report is usually called a ‘part’ or ‘section’, that of a novel a ‘chapter’—unless it is an epistolary novel, in which case it may be called a ‘letter’. Even texts which are not organized as linear prose narratives, or not as narratives at all, will frequently be subdivided in a similar way: a drama into ‘acts’ and ‘scenes’; a reference book into ‘sections’; a diary or day book into ‘entries’; a newspaper into ‘issues’ and ‘sections’, and so forth.

Because of this variety, these Guidelines propose that all such textual divisions be regarded as occurrences of the same neutrally named elements, with an attribute type used to categorize elements independently of their hierarchic level. Two alternative styles are provided for the marking of these neutral divisions: numbered and un-numbered. Numbered divisions are named div1, div2, etc., where the number indicates the depth of this particular division within the hierarchy, the largest such division being ‘div1’, any subdivision within it being ‘div2’, any further sub-sub-division being ‘div3’ and so on. Un-numbered divisions are simply named div, and allowed to nest recursively to indicate their hierarchic depth. The two styles must not be combined within a single front, body, or back element.

» 4.1.2 Numbered Divisions
Anfang | Inhalt

4.1.1 Un-numbered Divisions Un-numbered Divisions¶

The following element is used to identify textual subdivisions in the un-numbered style:

div ( Textgliederung ) enthält eine Untergliederung von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.

As a member of the class att.typed, this element has the following additional attributes:

att.typed provides attributes which can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology.
subtype provides a sub-categorization of the element, if needed

Using this style, the body of a text containing two parts, each composed of two chapters, might be represented as follows:

4.1.2 Numbered Divisions Numbered Divisions¶

The following elements are used to identify textual subdivisions in the numbered style:

div1 ( Textgliederungsebene -1 ) enthält die erste Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes, (gilt als die größte Ebene, sofern div0 nicht benutzt wird. Wird div0 benutzt, ist es die zweitgrößte).
div2 ( Textgliederungsebene -2 ) enthält die zweite Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.
div3 ( Textgliederungsebene -3 ) enthält die dritte Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.
div4 ( Textgliederungsebene -4 ) "> enthält die vierte Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.
div5 ( Textgliederungsebene -5 ) "> enthält die fünfte Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.
div6 ( Textgliederungsebene -6 ) enthält die sechste Gliederungsebene von Vorspann (front), Kerntext oder Nachspann (back) eines Textes.
div7 ( Textgliederungsebene -7 ) enthält die kleinste mögliche Untergliederung von Vorspann (front), Kerntext oder Nachspann (back) eines Textes, die größer als ein Absatz ist.

As members of the class att.typed these elements all bear the following additional attributes:

att.typed provides attributes which can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology.
subtype provides a sub-categorization of the element, if needed

The largest possible subdivision of the body is div1 element and the smallest possible div7. If numbered divisions are in use, a division at any one level (say, div3), may contain only numbered divisions at the next lowest level (in this case, div4).

Using this style, the body of a text containing two parts, each composed of two chapters, might be represented as follows:

4.1.3 Numbered or Un-numbered? Numbered or Un-numbered?¶

Within the same front, body, or back element, all hierarchic subdivisions must be marked using either nested div elements, or div1, div2 etc. elements nested as appropriate; the two styles must not be mixed.

The choice between numbered and un-numbered divisions will depend to some extent on the complexity of the material: un-numbered divisions allow for an arbitrary depth of nesting, while numbered divisions limit the depth of the tree which can be constructed. Where divisions at different levels should be processed differently (for example to ensure that chapters, but not sections, begin on a new page), numbered divisions slightly simplify the task of defining the desired processing for each level, though this distinction could also be made by supplying this information on the type attribute of an un-numbered div. Some software may find numbered divisions easier to process, as there is no need to maintain knowledge of the whole document structure in order to know the level at which a division occurs; such software may, however, find it difficult to cope with some other aspects of the TEI scheme. On the other hand, in a collection of many works it may prove difficult or impossible to ensure that the same numbered division always corresponds with the same type of textual feature: a ‘chapter’ may be at level 1 in one work and level 3 in another.

Whichever style is used, the global n and xml:id attributes (section 1.3.1.1 Global Attributes) may be used to provide reference strings or labels for each division of a text, where appropriate. Such labels should be provided for each section which is regarded as significant for referencing purposes (on reference systems, see further section 3.10 Reference Systems).

As indicated above, the type and subtype attributes provided by the att.typed class may be used to provide a name or description for the division. Typical values might be ‘book’, ‘chapter’, ‘section’, ‘part’, or (for verse texts) ‘book’, ‘canto’, ‘stanza’, or (for dramatic texts) ‘act’, ‘scene’. The following extended example uses numbered divisions to indicate the structure of a novel, and illustrates the use of the attributes discussed above. It also uses some elements discussed in section 4.2 Elements Common to All Divisions and the p element discussed in section 3.1 Paragraphs.

<div1 type="book" n="I" xml:id="JA0100">
<head>Book I.</head>
<div2 type="chapter" n="1" xml:id="JA0101">
  <head>Of writing lives in general, and particularly of Pamela, with a word
     by the bye of Colley Cibber and others.</head>
  <p>It is a trite but true observation, that examples work more forcibly on
     the mind than precepts: ... </p>

</div2>
<div2 type="chapter" n="2" xml:id="JA0102">
  <head>Of Mr. Joseph Andrews, his birth, parentage, education, and great
     endowments; with a word or two concerning ancestors.</head>
  <p>Mr. Joseph Andrews, the hero of our ensuing history, was esteemed to
     be the only son of Gaffar and Gammar Andrews, and brother to the
     illustrious Pamela, whose virtue is at present so famous ... </p>

</div2>

<trailer>The end of the first Book</trailer>
</div1>
<div1 type="book" n="II" xml:id="JA0200">
<head>Book II</head>
<div2 type="chapter" n="1" xml:id="JA0201">
  <head>Of divisions in authors</head>
  <p>There are certain mysteries or secrets in all trades, from the highest
     to the lowest, from that of <term>prime-ministering</term>, to this of
  <term>authoring</term>, which are seldom discovered unless to members of
     the same calling ... </p>
  <p>I will dismiss this chapter with the following observation: that it
     becomes an author generally to divide a book, as it does a butcher to
     joint his meat, for such assistance is of great help to both the reader
     and the carver. And now having indulged myself a little I will endeavour
     to indulge the curiosity of my reader, who is no doubt impatient to know
     what he will find in the subsequent chapters of this book.</p>
</div2>
<div2 type="chapter" n="2" xml:id="JA0202">
  <head>A surprising instance of Mr. Adams's short memory, with the
     unfortunate consequences which it brought on Joseph.
  </head>
  <p>Mr. Adams and Joseph were now ready to depart different ways ... </p>
</div2>
</div1>

type	characterizes the element in some sense, using any convenient classification scheme or typology.
subtype	provides a sub-categorization of the element, if needed

org	(organization) specifies how the content of the division is organized.
sample	indicates whether this division is a sample of the original source and if so, from which part.

model.divTopPart	groups elements which can occur only at the beginning of a text division.
model.divWrapper	groups elements which can appear at either top or bottom of a textual division.

model.divBottomPart	groups elements which can occur only at the end of a text division.
model.divWrapper	groups elements which can appear at either top or bottom of a textual division.

closer	fasst Datumszeile, Verfasserangabe, Grußformeln und ähnliche Angaben zusammen, die abschließend am Ende eines Abschnitts stehen, vor allem bei Briefen.
postscript	contains a postscript, e.g. to a letter.
signed	(Signatur) enthält die abschließende Grußformel o.Ä. die ein Vorwort, eine Widmung oder einen anderen Abschnitt des Textes beendet.
trailer	enthält Schlusstitel oder Fußzeile am Ende einer Untergliederung des Textes.

argument	Eine systematische Aufzählung oder Prosabeschreibung der Themen, die in einem Unterabschnitt des Textes behandelt werden.
byline	enthält Angaben zur Autorisation eines Werks, entweder auf der Titelseite oder am Anfang oder Ende des Werks.
dateline	enthält Angaben zu Entstehungsort, -datum, -zeit, usw. eines Briefs, Zeitungsartikels, oder anderen Werks, die als Überschrift oder Teil des Nachspanns dem Text voran- bzw. nachgestellt sind.
docAuthor	(Verfasser des Dokuments) enthält den Namen des Verfassers des Dokuments, wie auf dem Titelblatt angegeben (häufig, jedoch nicht immer mit eigener Zeile)
docDate	(Datierung des Dokuments) enthält die Datierung des Dokuments, die (üblicherweise) auf der Titelseite vermerkt ist
epigraph	enthält ein anonymes oder jemandem zugeschriebenes Zitat, das am Beginn eines Abschnitts, Kapitels oder auf einer Titelseite steht.
meeting	contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it.
salute	(Anrede- / Grußformel) enthält eine Anrede oder Grußformel, die einem Vorwort, einer Widmung oder einem anderen Abschnitt des Textes vorangestellt ist oder die Grußformel am Ende eines Briefes, eines Vorworts, usw.

P5: Richtlinien für die Auszeichnung und den Austausch elektronischer Texte

4 Default Text Structure