Text Encoding Initiative

16. Interpretation and Analysis


It is often said that all markup is a form of interpretation or analysis. While it is certainly difficult, and may be impossible, to distinguish firmly between `objective' and `subjective' information in any universal way, it remains true that judgments concerning the latter are typically regarded as more likely to provide controversy than those concerning the former. Many scholars therefore prefer to record such interpretations only if it is possible to alert the reader that they are considered more open to dispute, than the rest of the markup. This section describes some of the elements provided by the TEI scheme to meet this need.

16.1. Orthographic Sentences

Interpretation typically ranges across the whole of a text, with no particular respect to other structural units. A useful preliminary to intensive interpretation is therefore to segment the text into discrete and identifiable units, each of which can then bear a label for use as a sort of `canonical reference'. To facilitate such uses, these units may not cross each other, nor nest within each other. They may conveniently be represented using the following element:

<s>
identifies an s-unit within a document, for purposes of establishing a simple canonical referencing scheme covering the entire text. Attributes include

type
categorizes the unit (e.g. as declarative, interrogative, etc.)

As the name suggests, the <s> element is most commonly used (in linguistic applications at least) for marking orthographic sentences, that is, units defined by orthographic features such as punctuation. For example, the passage from Jane Eyre discussed earlier might be divided into s-units as follows:

<pb n="474"/>
<div1 type="chapter" n="38">
<p><s n="001">Reader, I married him.</s>
<s n="002">A quiet wedding we had:</s>
<s n="003">he and I, the parson and clerk, were alone present.</s>
<s n="004">When we got back from church, I went
into the kitchen of the manor-house, where Mary was cooking the dinner,
and John cleaning the knives, and I said &mdash;</s>
<p><q><s n="005">Mary, I have been married to Mr Rochester
this morning.</s></q> ...
Note that <s> elements cannot nest: the beginning of one <s> element implies that the previous one has finished. When s-units are tagged as shown above, it is advisable to tag the entire text end-to-end, so that every word in the text being analysed will be contained by exactly one <s> element, whose identifier can then be used to specify a unique reference for it. If the identifiers used are unique within the document, then the id attribute might be used in preference to the n used in the above example.

16.2. General-Purpose Interpretation Elements

A more general purpose segmentation element, the <seg> has already been introduced for use in identifying otherwise unmarked targets of cross references and hypertext links (see section 8. Cross References and Links); it identifies some phrase-level portion of text to which the encoder may assign a user-specified type, as well as a unique identifier; it may thus be used to tag textual features for which there is no provision in the published TEI Guidelines.

For example, the Guidelines provide no <apostrophe> element to mark parts of a literary text in which the narrator addresses the reader (or hearer) directly. One approach might be to regard these as instances of the <q> element, distinguished from others by an appropriate value for the who attribute. A possibly simpler, and certainly more general, solution would however be to use the <seg> element as follows:

<div1 type="chapter" n="38">
<p><seg type="apostrophe">Reader, I married him.</seg>
A quiet wedding we had: ...
The type attribute on the <seg> element can take any value, and so can be used to record phrase-level phenomena of any kind; it is good practice to record the values used and their significance in the header.

A <seg> element of one type (unlike the <s> element which it superficially resembles) can be nested within a <seg> element of the same or another type. This enables quite complex structures to be represented; some examples were given in section 8.3. Linking Attributes above. However, because it must respect the requirement that elements be properly nested, and may not cut across each other, it cannot cope with the common requirement to associate an interpretation with arbitrary segments of a text which may completely ignore the document hierarchy. It also requires that the interpretation itself be represented by a single coded value in the type attribute.

Neither restriction applies to the <interp> element, which provides powerful features for the encoding of quite complex interpretive information in a relatively straightforward manner.

<interp>
provides for an interpretive annotation which can be linked to a span of text. Attributes include:

value
identifies the specific phenomenon being annotated.
resp
indicates who is responsible for the interpretation.
type
indicates what kind of phenomenon is being noted in the passage. Sample values include image, character, theme, allusion, or the name of a particular discourse type whose instances are being identified.
inst
points to instances of the analysis or interpretation represented by the current element.

<interpGrp>
collects together <interp> tags.

This elements allows the encoder to specify both the class of an interpretation, and the particular instance of that class which the interpretation involves. Thus, whereas with <seg> one can say simply that something is an apostrophe, with <interp> one can say that it is an instance (apostrophe) of a larger class (rhetorical figures).

Moreover, <interp> is an empty element, which must be linked to the passage to which it applies either by means of the ana attribute discussed in section 8.3. Linking Attributes above, or by means of its own inst attribute. This means that any kind of analysis can be represented, with no need to respect the document hierarchy, and also facilitates the grouping of analyses of a particular type together. A special purpose <interpGrp> element is provided for the latter purpose.

For example, suppose that you wish to mark such diverse aspects of a text as themes or subject matter, rhetorical figures, and the locations of individual scenes of the narrative. Different portions of our sample passage from Jane Eyre for example, might be associated with the rhetorical figures of apostrophe, hyperbole, and metaphor; with subject-matter references to churches, servants, cooking, postal service, and honeymoons; and with scenes located in the church, in the kitchen, and in an unspecified location (drawing room?).

These interpretations could be placed anywhere within the <text> element; it is however good practice to put them all in the same place (e.g. a separate section of the front or back matter), as in the following example:

<back>
<div1 type="Interpretations">
<p><interp id="fig-apos"  resp="LB, MSM"
     type="figure of speech" value="apostrophe"/>
<interp id="fig-hyp"   resp="LB, MSM"
     type="figure of speech" value="hyperbole"/>
<!-- ... -->
<interp id="set-church"  resp="LB, MSM"
     type="setting" value="church"/>
<!-- ... -->
<interp id="ref-church"  resp="LB, MSM"
     type="reference" value="church"/>
<interp id="ref-serv"    resp="LB, MSM"
     type="reference" value="servants"/>
<!-- ... -->
</p></div1>

The evident redundancy of this encoding can be considerably reduced by using the <interpGrp> element to group together all those <interp> elements which share common attribute values, as follows:

<back>
<div1 type="Interpretations">
<p>
<interpGrp type="figure of speech" resp="LB, MSM">
<interp id="fig-apos" value="apostrophe"/>
<interp id="fig-hyp"  value="hyperbole"/>
<interp id="fig-meta" value="metaphor"/>
<!-- ... -->
</interpGrp>
<interpGrp type="scene-setting" resp="LB, MSM">
<interp id="set-church"  value="church"/>
<interp id="set-kitch"   value="kitchen"/>
<interp id="set-unspec"  value="unspecified"/>
<!-- ... -->
</interpGrp>
<interpGrp type="reference" resp="LB, MSM">
<interp id="ref-church" value="church"/>
<interp id="ref-serv"   value="servants"/>
<interp id="ref-cook"   value="cooking"/>
<!-- ... -->
</interpGrp>
</p></div1>

Once these interpretation elements have been defined, they can be linked with the parts of the text to which they apply in either or both of two ways. The ana attribute can be used on whichever element is appropriate:

<div1 type="chapter" n="38">
<p id="P38.1" ana="set-church set-kitch"></p>
<s id="P38.1.1" ana="fig-apos">Reader, I married him.</s>
...
Note in this example that since the paragraph has two settings (in the church and in the kitchen), the identifiers of both have been supplied.

Alternatively, the <interp> elements can point to all the parts of the text to which they apply, using their inst attribute:

<interp id="fig-apos" type="figure of speech" resp="LB, MSM"
   value="apostrophe" inst="P38.1.1"/>
<!-- ... -->
<interp id="set-church"  type="scene-setting" value="church"
   inst="P38.1" resp="LB, MSM"/>
<interp id="set-kitchen" type="scene-setting" value="kitchen"
   inst="P38.1" resp="LB, MSM"/>
<!-- ... -->

The <interp> is not limited to any particular type of analysis, The literary analysis shown above is but one possibility; one could equally well use <interp> to capture a linguistic part-of-speech analysis. For example, the example sentence given in section 8.3. Linking Attributes assumes a linguistic analysis which might be represented as follows:

<interp id="NP1" type="pos" value="noun phrase, singular"/>
<interp id="VV1" type="pos" value="inflected verb, present-tense singular"/>
...

Up: Contents Previous: 15. Figures and Graphics Next: 17. Technical Documentation



Date: (revised October 2004) Author: Lou Burnard (revised SPQR).
Copyright TEI 1995