The choice of a base tag set determines the basic structure of all the documents with which it is to be used, reflecting the fact that subelements likely to appear within a dictionary (for example) will be entirely different in kind from those likely to appear within a letter or a novel, and even more so from those likely to be found in a transcription of spoken language. To cater for this variety, the constituents of all divisions of a TEI <text> element are not defined explicitly, but in terms of parameter entities. The mechanism used is to provide definitions like the following within the DTD, one of which the user must over-ride by supplying an appropriate declaration in the DTD subset:
<!ENTITY % TEI.prose "IGNORE"> <!ENTITY % TEI.dictionary "IGNORE">The body of the main dtd contains a series of alternative definitions, each enclosed within an SGML marked section named after the base which it defines, as in this simplified example:
<![ %TEI.prose [ <!-- This definition is in force when the prose base is selected --> <!-- Its effect is to define component as either paragraph or list --> <!ENTITY % component "p|list" > ]&null;]> <![ %TEI.dictionary [ <!--This definition is in force when the dictionary base is selected --> <!-- Its effect is to define component as entry alone --> <!ENTITY % component "entry" > ]&null;]> <!-- This definition is always in force --> <!-- Its effect is to define component.seq as one or more of --> <!-- whatever definition of component is currently in force --> <!ENTITY % component.seq "(%component)+">Within the body of the DTD, elements are defined using these parameter entities only, for example:
<!ELEMENT div - - ((%component.seq)+)>To select a base tag set a declaration such as the following should be supplied within the DTD subset for the document:
<!ENTITY % TEI.prose "INCLUDE">This will over-ride the declaration within the TEI DTD itself, because it is given first. If no base is declared, the DTD will not compile.
The value of the parameter entity called component.seq will thus differ in different bases. In this way it is possible for the divisions of a text using the drama base (for example) to consist of speeches and stage directions, while those of a text using the dictionary base will consist of lexical entries.
A type attribute may be used to distinguish amongst
divisions in some respect other than their hierarchic position: the
values for this attribute (as for several others in the TEI scheme) are
not standardized, precisely because no consensus exists, or is likely to
exist, as to a generic typology. A set of legal values should however be
defined for a given application, either in the TEI Header or by a
user-defined modification.
In the normal case, the components of all divisions in a particular
base are homogeneous --- they all use the same value for
component.seq. However, the scheme also allows for two
kinds of heterogeneity. If the general base is selected,
together with two or more other bases, then different divisions of a
text may have different constituents, though each division must itself
be homogeneous. A mixed base is also defined, in which
components from any selection of bases may be combined promiscuously
across division boundaries.
This approach applies equally to the encoding of smaller units:
rather than attempt to enumerate all the different analytic units which
particular disciplines might find necessary, the TEI proposes two
generic segmentation elements: one (<s>) for simple end-to-end
segmentation, such as that commonly used in language corpora, roughly
corresponding to the notion of orthographic sentence; the other (<seg>)
for segments which can potentially self-nest. In either case, a type
attribute may be used to distinguish different kinds of segment.
Members of an attribute class share the same set of attributes. For
example, all elements which represent links or associations between one
element and another do so using a common set of attributes, defined
by the pointer attribute class.
Members of a model class share the same structural properties: that
is, they may appear at the same position within the SGML document
structure. For example, the
class divtop contains all elements (headings, epigraphs
etc.) which can appear at the start of a textual division; all elements used to mark editorial corrections or
omissions are members of the class edit; elements marking
bibliographic citations etc. are all members of the class
bibl and so on.
Elements may of course be members of more than one class. Classes
may have super- and sub-classes, and properties (notably associated
attributes) may be inherited. Classes are defined in the TEI dtd
by means of parameter entities, and used extensively for DTD maintenance,
documentation, and extension.
The TEI scheme supports three kinds of user modification: new elements may be added into
existing classes, and existing elements renamed or undefined. These operations are carried out in a controlled manner, using the class
system and without
any need for extensive revision of the TEI DTD itself.
The process of adding a new element to a class may be illustrated
as follows. Consider
the model class divTop mentioned above. Simplifying somewhat, this element class is defined as follows:
Parameter entities are also used to effect the two other kinds of
modification mentioned above:
the ability to undefine elements, and the ability to rename them.
Within the main TEI dtd, each element definition and its associated
attribute list specification is enclosed by a marked section with the same
name as the element, the default value for which is "INCLUDE". Thus, to
undefine the element <mentioned>, all that is needed is a declaration
like the following in the DTD subset:
A similar declaration may be used to rename any element; for example, to
rename <p> as <para>:
All user-defined modifications of this kind are regarded as
forming an additional tag set, which is embedded within the DTD in the
same way as as any other tag set, i.e. by enabling the
TEI.extensions
parameter entities. In this way a TEI document can make explicit the
extent and nature of any modification required in the base TEI scheme
for its processing. An auxiliary tag set is also provided for the
documentation of additional SGML elements in a way compatible with that
used for the rest of the scheme.
This list may be extended: for example,
selecting the additional tag set for analysis will add analytic attributes to
the above list. The id and n attributes allow for
the identification of any element occurrence within a TEI-conformant
text. Elements carrying an id attribute value may be
the object of a link or cross-reference, or any of the other
re-structuring mechanisms proposed by the TEI for circumventing the
rigidly hierarchic structure of a simple SGML DTD. The fact that the
requirement for such links is usually unpredictable is one reason for
making this attribute global.
Values on id attributes must be unique (their
declared value is ID). Values on the n attribute however
need not be; they may be used to carry a TEI canonical reference. A
method for defining the structure of such canonical reference schemes is
also provided, so that documents using it can be processed
automatically.
The lang attribute indicates both the language and
hence the writing system applicable to the element's content, thus
providing explicit support for polyglot or multiscript texts. If no
value is given, that of the element's direct parent is assumed. (A
number of TEI attributes have this characteristic, which is catered for
by a TEI-defined keyword). The value of this element identifies a
special purpose <language> element which documents the language
in use, optionally associating it with an external entity in which a
formal writing system declaration (WSD) may be given.
A WSD defines a language/writing system pair
(for example, ``Koine Greek, using TLG Beta Code'').
and is formally defined by an auxiliary DTD which allows each
character to be systematically defined and documented, in terms of
existing international or other standards, public or private entity
sets, ad hoc transliteration schemes or explicit definitions, as well as
combinations of all four.
Finally, the global rend element may be used to give
information about the physical presentation of the text in the source,
where this is not otherwise given. A default rendition may be specified
for all elements of a given type. No specific set of values is defined
for this attribute in the current draft, though it is probable that some
suitable set of DSSSL primitives will be proposed in a later version.
It should be stressed that the rend element is
not intended for use as a means of specifying the desired
formatting of an element, except insofaras this may be determined by a
desire to mimic the approximate appearance of the original text. Like
other SGML applications, the TEI scheme attempts to provide elements for
the encoding of those textual features deemed essential to a productive
use of the encoded text; however, unlike most other SGML applications,
the TEI scheme recognizes that for some, it is precisely the appearance
of a text which is the object of research.
5.1 Textual Divisions
Although the actual components may differ, groups of textual
components are potentially grouped into higher level
`division's in almost any kind of text. These
higher level units may be called variously
`chapters', `sections',
`subdvisions', `acts' or
`parts' but all seem to behave in more or less the
same way: they are incomplete in themselves, and nested hierarchically. In
the TEI scheme all such objects are therefore regarded as the same
kind of element, called here a division. 5.2 The TEI Class System and Modification Mechanisms
Textual features, and hence the elements which encode them, may be
categorized or classified in a number of ways. The TEI scheme identifies
two kinds of classification scheme: attribute classes and
model classes; both are used for broadly similar purposes.
<!ENTITY % x.divtop "">
<!ENTITY % m.divtop "%x.divtop head | byline | epigraph">
To add a new element (say, <keywords>) to this class,
enabling it to appear anywhere in the content model that other members
of the class do, all that is needed is to re-define the
`x-entity' within the document type subset:
<!ENTITY % x.divtop "keywords |">
Note the trailing vertical bar, which is required. As it happens, the
element <keywords> is already defined in the TEI scheme (within the
header); if it were not, an element declaration would also be
necessary.
<!ENTITY % mentioned "IGNORE">
<!ENTITY % n.p "para">
This works because all references to the <p> element throughout
the TEI dtd are made indirectly, using the n.p entity.
Furthermore, the original name for an element is recoverable by an SGML
application, because it forms the value of a global attribute
teiform of declared type FIXED.5.3 The global attributes
One particularly important class is the global
attribute class. By default the following attributes are members of
this class and may therefore be supplied for all elements in the
TEI scheme:
Back to table of contents
On to next section
Back to previous section