17 Simple Analytic Mechanisms
目次
This chapter describes a module for associating simple analyses and interpretations with text elements. We use the term analysis here to refer to any kind of semantic or syntactic interpretation which an encoder wishes to attach to all or part of a text. Examples discussed in this chapter include familiar linguistic categorizations (such as ‘clause’, ‘morpheme’, ‘part-of-speech’ etc.) and characterizations of narrative structure (such as ‘theme’, ‘reconciliation’ etc.). The mechanisms presented in this chapter are simpler but less powerful than those described in chapter 18 Feature Structures.
Section 17.1 Linguistic Segment Categories introduces a module for characterizing text segments according to the familiar linguistic categories of sentence or s-unit, clause, phrase, word, morpheme, and character. These elements represent special cases of the generic seg element described in section 16.3 Blocks, Segments, and Anchors.
Section 17.2 Global Attributes for Simple Analyses introduces an additional global attribute which allows passages of text to be associated with specialised elements representing their interpretation. These ‘interpretative’ elements (span and interp) are described in detail in section 17.3 Spans and Interpretations. They allow the encoder to specify an analysis as a series of names and associated values,64 each such pair being linked to one or more stretches of text, either directly, in the case of spans, or indirectly, in the case of interpretations.
Finally section 17.4 Linguistic Annotation revisits the topic of linguistic analysis, and illustrates how these interpretative mechanisms may be used to associate simple linguistic analysis with text segments.
17.1 Linguistic Segment CategoriesTEI: Linguistic Segment Categories¶
- att.segLike
任意の部分に使用される要素向けの属性を示す.
function 当該部分の役割を示す.
- att.typed
要素を分類するための属性を示す.
type 当該要素の分類を示す. subtype 必要であれば,当該要素の下位分類を示す.
These elements are also all members of the model.segLike class, which is a subclass of model.phrase. They may thus appear anywhere that text is permitted within a document, when the module defined by this chapter is included in a schema.
<s>Nineteen fifty-four, when I was eighteen years old,
is held to be a crucial turning point in the history of
the Afro-American — for the U.S.A. as a whole — the
year segregation was outlawed by the U.S. Supreme Court.</s>
<s>It was also a crucial year for me because on June 18,
1954, I began serving a sentence in state prison for
possession of marijuana.</s>
</p>
<s>
<cl>It was about the beginning of September, 1664,
<cl>that I, among the rest of my neighbours,
heard in ordinary discourse
<cl>that the plague was returned again to Holland; </cl>
</cl>
</cl>
<cl>for it had been very violent there, and particularly at
Amsterdam and Rotterdam, in the year 1663, </cl>
<cl>whither, <cl>they say,</cl> it was brought,
<cl>some said</cl> from Italy, others from the Levant, among some goods
<cl>which were brought home by their Turkey fleet;</cl>
</cl>
<cl>others said it was brought from Candia;
others from Cyprus. </cl>
</s>
<s>
<cl>It mattered not <cl>from whence it came;</cl>
</cl>
<cl>but all agreed <cl>it was come into Holland again.</cl>
</cl>
</s>
</p>
Clauses may be further divided into phr elements in the same way. A text may be segmented directly into clauses, or into phrases, with no need to include segmentation at a higher level as well.
<l>
<cl part="I">Tweedledum and Tweedledee</cl>
</l>
<l>
<cl part="F">Agreed to have a battle;</cl>
</l>
<l>
<cl part="I">For Tweedledum said <cl part="I">Tweedledee</cl>
</cl>
</l>
<l>
<cl part="F">
<cl part="F">Had spoiled his nice new rattle.</cl>
</cl>
</l>
</div>
<div type="stanza">
<l>
<cl part="I">Just then flew down a monstrous crow,</cl>
</l>
<l>
<cl part="F">As black as a tar barrel;</cl>
</l>
<l>
<cl part="I">Which frightened both the heroes so,</cl>
</l>
<l>
<cl part="F">
<cl>They quite forgot their quarrel.</cl>
</cl>
</l>
</div>
<cl next="#c5" xml:id="c3" part="I">For Tweedledum said
<cl next="#c6" xml:id="c4" part="I">Tweedledee</cl>
</cl>
</l>
<l>
<cl prev="#c3" xml:id="c5" part="F">
<cl prev="#c4" xml:id="c6" part="F">Had spoiled his nice new rattle.</cl>
</cl>
</l>
The type attribute on linguistic segment categories can be used to provide additional interpretative information about the category. The function attribute on the cl and phr elements can be used to provide additional information about the function of the category. Legal values for these two attributes are not defined by these Guidelines, but should be documented in the segmentation element of the encodingDesc element within the document's header. A general approach to the encoding of linguistic categories for parts of a text is discussed in section 17.4 Linguistic Annotation below.
<cl type="relative" function="clause_modifier">from whence it came;</cl>
</cl>
<phr>was outlawed</phr>
<phr type="PP" function="postmodifier-agent">by the U.S. Supreme Court.</phr>
<s>
<cl type="finite-declarative" function="independent">
<phr type="NP" function="subject">Nineteen fifty-four,
<cl type="finite-relative-declarative" function="appositive">when <phr type="NP" function="subject">I</phr>
<phr type="VP" function="predicate">was eighteen years old</phr>
</cl>
</phr>,
<phr type="VP" function="predicate">
<phr type="V" function="verb-main">is held</phr>
<phr type="NP" function="complement">
<cl type="nonfinite" function="predicate-nom.">
<phr type="V" function="copula">to be</phr>
<phr type="NP" function="predicate-nom.">a crucial turning point
<phr type="PP" function="postmodifier">in
<phr type="NP" function="prep.obj.">the history
<phr type="PP" function="postmodifier">of the Afro-American</phr>
</phr>
</phr>
—
<phr type="PP" function="postmodifier-appositive">for
<phr type="NP" function="prep.obj.">the U.S.A.
<phr type="PP" function="postmodifier">as a whole</phr>
</phr>
</phr>
</phr>
—
<phr type="NP" function="appositive-predicate-nom.">the year
<cl type="finite-relative" function="adjectival">
<phr type="NP" function="subject">segregation</phr>
<phr type="VP" function="predicate">
<phr type="V" function="verb-main">was outlawed</phr>
<phr type="PP" function="postmodifier">by the U.S. Supreme Court</phr>
</phr>
</cl>
</phr>
</cl>
</phr>
</phr>.</cl>
</s>
<s>
<cl type="finite-declarative" function="independent">
<phr type="NP" function="subject">It</phr>
<phr type="VP" function="predicate">
<phr type="V" function="verb-main">was</phr>
also
<phr type="NP" function="predicate-nom.">a crucial year for me</phr>
</phr>
<cl type="declarative-finite" function="dependent-causative">because
<phr type="PP" function="sentence_adverb">on June 18, 1954</phr>,
<phr type="NP" function="subject">I</phr>
<phr type="VP" function="predicate">
<phr type="V" function="verb-main">began serving</phr>
<phr type="NP" function="complement">a sentence in state prison
<phr type="PP" function="complement">for possession of marijuana</phr>
</phr>
</phr>
</cl>
</cl>
</s>.
</p>
This style of markup may introduce spurious new lines and blanks into the text. If the original layout is important, it should be explicitly encoded, using such facilities as the lb element, the global rend or rendition attributes, etc.
The w, m, and c elements are also identical in meaning to the seg element with a type attribute of ‘w’, ‘m’, or ‘c’, and may occur wherever seg is permitted to occur. However, they have more restricted content models than does seg: for example, the w element should only contain w, m, and c elements, or plain text; the m element should contain only c elements or plain text; the c element should contain only plain text, most often only a single character or a sequence of graphemes to be treated as a single character. Consequently, while these more specific elements can be translated directly into typed seg elements, the reverse is not necessarily the case.
<w>grandiloquent</w>
</mentioned>
<phr>grandiloquent speech</phr>
</mentioned>
<mentioned>grandiloquent speech</mentioned>
</phr>
<w lemma="timeo">timeo</w>
<w lemma="danaii">Danaos</w>
<w lemma="et">et</w>
<w lemma="donum">dona</w>
<w lemma="fero">ferentes</w>
</s>
<m type="prefix" baseForm="con">com</m>
<m type="root">fort</m>
<m type="suffix">able</m>
</w>
<w>
<w>did</w>
<m>n't</m>
</w>
<w>do</w>
<w>it</w>
<c>.</c>
This segmentation, crude as it is, succeeds in representing the idea that did occurs as a word inside the word didn't. A further advantage of segmenting the text down to this level is that it becomes relatively simple to associate each such segment with a more detailed formal analysis. This matter is taken up in detail in section 17.4 Linguistic Annotation.
17.2 Global Attributes for Simple AnalysesTEI: Global Attributes for Simple Analyses¶
- att.global.analytic
任意のテキスト部分への分析・解釈に関連するグローバル属性を示す.
ana (analysis) 属性anaを伴う要素の解釈を含む要素を示す.
17.3 Spans and InterpretationsTEI: Spans and Interpretations¶
- att.interpLike
形式的分析や解釈を示す要素に付与される属性を示す.
resp (responsible party) 当該解釈に責任を持つ人物を示す. type 当該部分で,どのような面が指摘されているのかを示す. inst (instances) 当該要素で示されている分析や解釈の実体を示す.
<s xml:id="MaQp1s2p114s1">There was certainly a definite point at which the
thing began.</s>
<s xml:id="MaQp1s2p114s2">It was not; then it was suddenly inescapable,
and nothing could have frightened it away.</s>
<s xml:id="MaQp1s2p114s3">There was a slow integration, during which she,
and the little animals, and the moving grasses, and the sun-warmed
trees, and the slopes of shivering silvery mealies, and the great
dome of blue light overhead, and the stones of earth under her feet,
became one, shuddering together in a dissolution of dancing
atoms.</s>
<s xml:id="MaQp1s2p114s4">She felt the rivers under the ground forcing
themselves painfully along her veins, swelling them out in an
unbearable pressure; her flesh was the earth, and suffered growth
like a ferment; and her eyes stared, fixed like the eye of the
sun.</s>
<s xml:id="MaQp1s2p114s5">Not for one second longer (if the terms for time
apply) could she have borne it; but then, with a sudden movement
forwards and out, the whole process stopped; and <emph rend="italic">that</emph> was <soCalled rend="dquo">the
moment</soCalled> which it was impossible to remember
afterwards.</s>
<span from="#MaQp1s2p114s3" to="#MaQp1s2p114s5">the moment</span>
<s xml:id="MaQp1s2p114s6">For during that space of time (which was
timeless) she understood quite finally her smallness, the
unimportance of humanity.</s>
</p>
<span from="#MaQp1s2p114s3" to="#MaQp1s2p114s5">the moment</span>
<!-- other spans identified by DTL here -->
</spanGrp>
Sigmund, the son of Volsung, was a king in Frankish country. Sinfiotli was the eldest of his sons, the second was Helgi, the third Hamund. Borghild, Sigmund's wife, had a brother named — But Sinfiotli, her stepson, and — both wooed the same woman and Sinfiotli killed him over it.66 And when he came home, Borghild asked him to go away, but Sigmund offered her weregild, and she was obliged to accept it. At the funeral feast Borghild was serving beer. She took poison, a big drinking horn full, and brought it to Sinfiotli. When Sinfiotli looked into the horn, he saw that poison was in it, and said to Sigmund ‘This drink is cloudy, old man.’ Sigmund took the horn and drank it off. It is said that Sigmund was hardy and that poison did him no harm, inside or out. And all his sons could tolerate poison on their skin. Borghild brought another horn to Sinfiotli, and asked him to drink, and everything happened as before. And a third time she brought him a horn, and reproachful words as well, if he didn't drink from it. He spoke again to Sigmund as before. He said ‘Filter it through your mustache, son!’ Sinfiotli drank it off and at once fell dead.
Sigmund carried him a long way in his arms and came to a long, narrow fjord, and there was a small boat there and a man in it. He offered to ferry Sigmund over the fjord. But when Sigmund carried the body out to the boat, it was fully laden. The man said Sigmund should go around the fjord inland. The man pushed the boat out and then suddenly vanished.
King Sigmund lived a long time in Denmark in the kingdom of Borghild, after he married her. Then he went south to Frankish lands, to the kingdom he had there. Then he married Hiordis, the daughter of King Eylimi. Their son was Sigurd. King Sigmund fell in a battle with the sons of Hunding. And then Hiordis married Alf, the son of King Hialprec. Sigurd grew up there as a boy.
Sigmund and all his sons were tall and outstanding in their strength, their growth, their intelligence, and their accomplishments. But Sigurd was the most outstanding of all, and everyone who knows about the old days says he was the most outstanding of men and the noblest of all the warrior kings.
<s xml:id="S1">Sigmund ... was a king in Frankish country.</s>
<s xml:id="S2">Sinfiotli was the eldest of his sons.</s>
<s xml:id="S3">Borghild, Sigmund's wife, had a brother ...</s>
<s xml:id="S4A">But Sinfiotli ... wooed the same woman</s>
<s xml:id="S4B">and Sinfiotli killed him over it.</s>
<s xml:id="S5">And when he came home, ... she was obliged to accept it.</s>
<s xml:id="S6">At the funeral feast Borghild was serving beer.</s>
<s xml:id="S7">She took poison ... and brought it to Sinfiotli.</s>
<s xml:id="S17">Sinfiotli drank it off and at once fell dead.</s>
<anchor xml:id="EOS17"/>
</p>
<p xml:id="P2">Sigmund carried him a long way in his arms ... </p>
<p xml:id="P3">King Sigmund lived a long time in Denmark ... </p>
<p xml:id="P4">Sigmund and all his sons were tall ... </p>
<spanGrp resp="#TMA" type="narrative-structure">
<span from="#S1" to="#S3">introduction</span>
<span from="#S4A">conflict</span>
<span from="#S4B">climax</span>
<span from="#S5" to="#S17">revenge</span>
<span from="#EOS17">reconciliation</span>
<span from="#P2" to="#P4">aftermath</span>
</spanGrp>
Note the use of an empty anchor element to provide a target for the ‘reconciliation’ unit which is normally part of the narrative pattern but which is not realized in the text shown.
The same analysis may be expressed with the interp element instead of the span element; this element provide attributes for recording an interpretive category and its value, as well as the identity of the interpreter, but does not itself indicate which passage of text is being interpreted; the same interpretive structures can thus be associated with many passages of the text. The association between text passages and interp elements must be made either by pointing from the text to the interp element with the ana attribute defined in section 17.2 Global Attributes for Simple Analyses, or by pointing at both text and interpretation from a link element, as described in chapter 16 Linking, Segmentation, and Alignment.
<s xml:id="MarQp1s2p114s1">There was certainly a definite point ... </s>
<s xml:id="MarQp1s2p114s2">It was not; then it was suddenly inescapable ... </s>
<seg xml:id="MarQp1s2p114s3-5" ana="#moment">
<s xml:id="MarQp1s2p114s3">There was a slow integration ... </s>
<s xml:id="MarQp1s2p114s4">She felt the rivers under the ground ... </s>
<s xml:id="MarQp1s2p114s5">Not for one second longer ... </s>
</seg>
<s xml:id="MarQp1s2p114s6">For during that space of time ... </s>
</p>
<interp xml:id="moment">the moment</interp>
<interp xml:id="INTRO">introduction</interp>
<interp xml:id="CONFLICT">conflict</interp>
<interp xml:id="CLIMAX">climax</interp>
<interp xml:id="REVENGE">revenge</interp>
<interp xml:id="RECONCIL">reconciliation</interp>
<interp xml:id="AFTERM">aftermath</interp>
</interpGrp>
<seg xml:id="SS1-SS3" ana="#INTRO">
<s xml:id="SS1">Sigmund ... was a king in Frankish country.</s>
<s xml:id="SS2">Sinfiotli was the eldest of his sons.</s>
<s xml:id="SS3">Borghild, Sigmund's wife, had a brother ... </s>
</seg>
<s xml:id="SS4A" ana="#CONFLICT">But Sinfiotli ... wooed the same woman</s>
<s xml:id="SS4B" ana="#CLIMAX">and Sinfiotli killed him over it.</s>
<seg xml:id="SS5-SS17" ana="#REVENGE">
<s xml:id="SS5">And when he came home, ... she was obliged to accept it.</s>
<s xml:id="SS6">At the funeral feast Borghild was serving beer.</s>
<s xml:id="SS17">Sinfiotli drank it off and at once fell dead.</s>
</seg>
</p>
<anchor xml:id="NIL1" ana="#RECONCIL"/>
<p xml:id="PP2">Sigmund carried him a long way in his arms ... </p>
<p xml:id="PP3">King Sigmund lived a long time in Denmark ... </p>
<p xml:id="PP4">Sigmund and all his sons were tall ... </p>
<join xml:id="PP2-PP4" targets="#PP2 #PP3 #PP4" ana="#AFTERM"/>
<link targets="#INTRO #SS1-SS3"/>
<link targets="#CONFLICT #SS4A"/>
<link targets="#CLIMAX #SS4B"/>
<link targets="#REVENGE #SS5-SS17"/>
<link targets="#RECONCIL #NIL1"/>
<link targets="#AFTERM #PP2-PP4"/>
</linkGrp>
One obvious advantage of using interp rather than span elements for the Sigmund text is that the interp elements can be reused for marking up other texts in the same document, whereas the span elements cannot. Another is that the interp element can be used to provide interpretations for discontinuous text elements (represented by join elements). On the other hand, the use of interp elements may require the creation of special text elements not otherwise needed (e.g. the seg and the join in the revised encoding of the text), whereas the use of span elements does not.
17.4 Linguistic AnnotationTEI: Linguistic Annotation¶
By linguistic annotation we mean here any annotation determined by an analysis of linguistic features of the text, excluding as borderline cases both the formal structural properties of the text (e.g. its division into chapters or paragraphs) and descriptive information about its context (the circumstances of its production, its genre or medium). The structural properties of any TEI-conformant text should be represented using the structural elements discussed elsewhere in this chapter and in chapters 3 Elements Available in All TEI Documents, 4 Default Text Structure, and the various chapters of Part III. The contextual properties of a TEI text are fully documented in the TEI Header, which is discussed in chapter 2 The TEI Header, and in section 15.2 Contextual Information.
Other forms of linguistic annotation may be applied at a number of levels in a text. A code (such as a word-class or part-of-speech code) may be associated with each word or token, or with groups of such tokens, which may be continuous, discontinuous, or nested. A code may also be associated with relationships (such as cohesion) perceived as existing between distinct parts of a text. The codes themselves may stand for discrete and non-decomposable categories, or they may represent highly articulated bundles of textual features. Their function may be to place the annotated part of the text somewhere within a narrowly linguistic or discoursal domain of analysis, or within a more general semantic field, or any combination drawn from these and other domains.
The manner by which such annotations are generated and attached to the text may be entirely automatic, entirely manual or a mixture. The ease and accuracy with which analysis may be automated may vary with the level at which the annotation is attached. The method employed should be documented in the interpretation element within the encoding description of the TEI Header, as described in section 2.3.3 The Editorial Practices Declaration. Where different parts of a language corpus have used different annotation methods, the decls attribute may be used to indicate the fact, as further discussed in section 15.3 Associating Contextual Information with a Text.
The victim's friends told police that Kruger drove into the quarry and never surfaced.
<w ana="#AT0">The </w>
<w ana="#NN1">victim</w>
<w ana="#POS">'s</w>
<w ana="#NN2">friends </w>
<w ana="#VVD">told </w>
<w ana="#NN2">police </w>
<w ana="#CJT">that </w>
<w ana="#NP0">Kruger </w>
<w ana="#VVD">drove </w>
<w ana="#PRP">into </w>
<w ana="#AT0">the </w>
<w ana="#NN1">quarry </w>
<w ana="#CJC">and </w>
<w ana="#AV0">never </w>
<w ana="#VVD">surfaced</w>
</s>
<interp xml:id="AT0">Definite article</interp>
<interp xml:id="AV0">Adverb</interp>
<interp xml:id="CJC">Conjunction</interp>
<interp xml:id="CJT">Relative that</interp>
<interp xml:id="NN1">Noun singular</interp>
<interp xml:id="NN2">Noun plural</interp>
<interp xml:id="NP0">Proper noun</interp>
<interp xml:id="POS">Genitive marker</interp>
<interp xml:id="PRP">Preposition</interp>
<interp xml:id="VVD">Verb past tense</interp>
</interpGrp>
<phr ana="#n">
<phr ana="#gn">
<w ana="#AT0">The</w>
<w ana="#NN1">victim</w>
<m ana="#POS">'s</m>
</phr>
<w ana="#NN2">friends</w>
</phr>
<phr ana="#v">
<w ana="#VVD">told</w>
<phr ana="#n">
<w ana="#NN2">police</w>
</phr>
<cl ana="#fn">
<w ana="#CJT">that</w>
<phr ana="#n">
<w ana="#NP0">Krueger</w>
</phr>
<phr ana="#v">
<phr ana="#v1">
<w ana="#VVD">drove</w>
<phr ana="#pr">
<w ana="#PRP">into</w>
<phr ana="#n">
<w ana="#AT0">the</w>
<w ana="#NN1">quarry</w>
</phr>
</phr>
</phr>
<w ana="#CJC">and</w>
<phr ana="#v2">
<w ana="#AV0">never</w>
<w ana="#VVD">surfaced</w>
</phr>
</phr>
</cl>
</phr>
<c ana="#pun">.</c>
</s>
<interp xml:id="v2">coordinate continuation</interp>
<interp xml:id="v">verbal</interp>
<interp xml:id="n">nominal</interp>
<interp xml:id="gn">genitive</interp>
<interp xml:id="fn">finite clause</interp>
<interp xml:id="pr">prepositional</interp>
<interp xml:id="v1">coordinate start</interp>
</interpGrp>
<w xml:id="word-1">The</w>
<w xml:id="word-2">victim</w>
<w xml:id="word-3">'s</w>
<w xml:id="word-4">friends</w>
<w xml:id="word-5">told</w>
<w xml:id="word-6">police</w>
<w xml:id="word-7">that</w>
<w xml:id="word-8">Kruger</w>
<w xml:id="word-9">drove</w>
<w xml:id="word10">into</w>
<w xml:id="word11">the</w>
<w xml:id="word12">quarry</w>
<w xml:id="word13">and</w>
<w xml:id="word14">never</w>
<w xml:id="word15">surfaced</w>
</s>
<link targets="#word-1 #AT0"/>
<link targets="#word-2 #NN1"/>
<link targets="#word-3 #POS"/>
<link targets="#word-4 #NN2"/>
<link targets="#word-5 #VVD"/>
<link targets="#word-6 #NN2"/>
<!--... -->
</linkGrp>
Each linguistic segment so far discussed has been well-behaved with respect to the basic document hierarchy, having only a single parent. Moreover, the segmentation has been complete, in that each part of the text is accounted for by some segment at each level of analysis, without discontinuities or overlap. This state of affairs does not of course apply in all types of analysis, and these Guidelines provide a number of mechanisms to support the representation of discontinuities or multiple analyses. A brief overview of these facilities is provided in chapter 20 Non-hierarchical Structures; also see 16 Linking, Segmentation, and Alignment. These mechanisms all depend to a greater or lesser degree on the use of pointing elements of various kinds.
<u xml:id="u2">Yes, anything else?</u>
<u xml:id="u3">No thanks.</u>
<u xml:id="u4">That'll be dollar forty.</u>
<u xml:id="u5">Two dollars</u>
<u xml:id="u6">Sixty, eighty, two dollars. Thank you.</u>
<spanGrp type="transactions">
<span from="#u1">sale request</span>
<span from="#u2" to="#u3">sale compliance</span>
<span from="#u4">sale</span>
<span from="#u5">purchase</span>
<span from="#u6">purchase closure</span>
</spanGrp>
17.5 Module for Analysis and InterpretationTEI: Module for Analysis and Interpretation¶
↑ 目次 « 16 Linking, Segmentation, and Alignment » 18 Feature Structures