21 Certainty, Precision, and Responsibility
Table of contents
Encoders of text often find it useful to indicate that some aspects of the encoded text are problematic or uncertain, and to indicate who is responsible for various aspects of the markup of the electronic text. These Guidelines provide several methods of recording uncertainty about the text or its markup:
- the note element defined in section 3.8 Notes, Annotation, and Indexing may be used with a value of certainty for its type attribute.
- the certainty element defined in this chapter may be used to record the nature and degree of the uncertainty in a more structured way.
- the precision element defined in this chapter may be used to record the accuracy with which some numerical value (such as a date or quantity) is provided by some other element or attribute.
- the alt element defined in the module for linking and segmentation may be used to provide alternative encodings for parts of a text, as described in section 16.8 Alternation.
There are three methods of indicating responsibility for different aspects of the electronic text:
- the TEI header records who is responsible for an electronic text by means of the respStmt element and other more specific elements (author, sponsor, funder, principal, etc.) used within the titleStmt, editionStmt, and revisionDesc elements.
- the note element may be used with a value of resp or responsibility in its type attribute.
- the respons element defined in this chapter may be used to record fine-grained structured information about responsibility for individual tags in the text.
No special steps are needed to use the note and respStmt elements, since they are defined in the core module and header respectively. The alt element is only available when the module for linking has been selected, as described in chapter 16 Linking, Segmentation, and Alignment. To use the certainty, precision or respons elements, the module for certainty and responsibility must be selected.
These three elements are all members of an attribute class called att.scoping from which they inherit the following attributes:
- att.scoping provides attributes for selecting particular elements within a document by means of XPath.
target points at one or several elements or sets of elements by means of one or more data pointers, using the URI syntax. match supplies an arbitrary XPath expression using the syntax defined in Kay (ed.) (2007) which identifies a set of nodes, selected within the context identified by the target attribute if this is supplied, or within the context of the element bearing this attribute if it is not.
These attributes enable statements about certainty, precision, or responsibility to be made with respect to the whole of a document, or any part or parts of it which can be identified using standard XML location methods. Several examples are given in the discussion of the certainty element below; the same mechanisms are available for all three element discussed in this chapter.
TEI: Levels of Certainty¶21.1 Levels of Certainty
Many types of uncertainty may be distinguished. The certainty element is designed to encode the following sorts:
- a given tag may or may not correctly apply (e.g. a given word may be a personal name, or perhaps not)
- the precise point at which an element begins or ends is uncertain
- the value given for an attribute is uncertain
- the content given for an element is unreliable for any reason.
The following types of uncertainty are not indicated with the certainty element:
- the numerical precision associated with a number or date (for this use the precision element discussed in 21.2 Indications of Precision)
- the content of the document being transcribed is identifiable, but may be read or understood in different ways (for this use the transcriptional elements such as unclear, discussed in chapter 11 Representation of Primary Sources)
- a transcriber, editor, or author wishes to indicate a level of confidence in a factual assertion made in the text (for this use the interpretative mechanisms discussed in 17 Simple Analytic Mechanisms and 18 Feature Structures)
TEI: Using Notes to Record Uncertainty¶21.1.1 Using Notes to Record Uncertainty
<note type="certainty" resp="#MSM">It is not
clear here whether <mentioned>Essex</mentioned>
refers to the place or to the nobleman. -MSM</note>
She had always liked <placeName xml:id="CE-p1b">Essex</placeName>.
<note type="certainty" resp="#MSM"
target="#CE-p1a #CE-p1b">It
is not clear here whether <mentioned>Essex</mentioned>
refers to the place or to the nobleman. If the latter,
it should be tagged as a personal name. -<name xml:id="MSM">Michael</name>
</note>
The advantage of this technique is its relative simplicity. Its disadvantage is that the nature and degree of uncertainty are not conveyed in any systematic way and thus are not susceptible to any sort of automatic processing.
TEI: Structured Indications of Uncertainty¶21.1.2 Structured Indications of Uncertainty
To record uncertainty in a more structured way, susceptible of at least simple automatic processing, the certainty element may be used:
- certainty indicates the degree of certainty associated with some aspect of the text markup.
locus indicates more exactly the aspect concerning which certainty is being expressed: specifically, whether the markup is correctly located, whether the correct element or attribute name has been used, or whether the content of the element or attribute is correct, etc. degree indicates the degree of confidence assigned to the aspect of the markup named by the locus attribute.
<placeName xml:id="CE-pl1">Essex</placeName>.
<!-- ... elsewhere in the document ... -->
<certainty target="#CE-pl1" locus="name">
<desc>possibly not a placename</desc>
</certainty>
<!-- ... --><certainty target="#CE-pl1" locus="name"
degree="0.6"/>
<!-- ... --><certainty target="#CE-pl1" locus="name"
degree="0.6">
<desc>probably a placename, but possibly not</desc>
</certainty>
<certainty target="#CE-pl1" locus="name"
degree="0.4" assertedValue="persName">
<desc>may refer to the Earl of Essex</desc>
</certainty>
<placeName>Essex
<certainty locus="name" degree="0.6"/>
</placeName>.
TEI: Contingent Conditions¶21.1.2.1 Contingent Conditions
She had always liked <placeName xml:id="CE-PL2">Essex</placeName>.
<!-- ... -->
<!-- 60% chance that P1 is a placename, 40% chance a personal name. -->
<certainty xml:id="cert-1" target="#CE-PL1"
locus="name" degree="0.6">
<desc>probably a placename, but possibly not"</desc>
</certainty>
<certainty xml:id="cert-2" target="#CE-PL1"
locus="name" assertedValue="persName" degree="0.4">
<desc>may refer to the Earl of Essex"</desc>
</certainty>
<!-- 60% chance that P2 is a placename, 40% chance a personal name. 100% chance that it agrees with P1. -->
<certainty target="#CE-PL2" locus="name"
given="#cert-1" degree="1.0">
<desc>if CE-PL1 is a placename, CE-PL2 certainly is"</desc>
</certainty>
<certainty target="#CE-PL2" locus="name"
assertedValue="persName" degree="1.0" given="#cert-2">
<desc>if CE-PL1 is a personal name, then so is CE-PL2</desc>
</certainty>
<certainty xml:id="cert1" target="#CE-p2"
locus="name" degree="0.6"/>
<certainty target="#CE-p2" locus="start"
given="#cert1" degree="0.9"/>
<certainty xml:id="cert2" target="#CE-p2"
locus="name" assertedValue="placeName" degree="0.4"/>
<certainty target="#CE-p2" locus="start"
given="#cert2" degree="0.5"/>
<certainty xml:id="cert3" target="#CE-p2"
locus="start" assertedValue="#CE-a1" given="#cert1"
degree="0.1"/>
<certainty xml:id="cert4" target="#CE-p2"
locus="start" assertedValue="#CE-a1" given="#cert2"
degree="0.5"/>
Ernest went to old <placeName>Saybrook</placeName>. (0.4 * 0.5, or 0.20)
Ernest went to <placeName>old Saybrook</placeName>. (0.4 * 0.5, or 0.20)
TEI: Pervasive Conditions¶21.1.2.2 Pervasive Conditions
match="//persName"/>
<p>.....</p>
</div>
<div>
<certainty locus="name" degree="0.3"
match=".//persName"/>
</div>
checked
: match="//div[@type='checked']//persName"/>
<!-- ... -->
<certainty match=".//my:*" locus="value"
degree="0.9"/>
</div>
my
. This namespace prefix must be associated with an appropriate namespace definition, either on the certainty element itself, or on one of its ancestor elements.TEI: Content Uncertainty¶21.1.2.3 Content Uncertainty
<certainty target="#CE-p3" locus="value"
degree="0.5"/>
<choice>
<expan xml:id="CE-e1">Standard
Generalized Markup Language</expan>
<expan xml:id="CE-e40">Some Grandiose Methodology for Losers</expan>
<abbr>SGML</abbr>
</choice> ...
<!-- ... -->
<certainty target="#CE-e1" locus="value"
degree="0.9"/>
<certainty target="#CE-e40" locus="value"
degree="0.5"/>
<certainty target="#CE-P3" locus="value"
assertedValue="gun" degree="0.8">
<desc>a gun makes more sense in a holdup</desc>
</certainty>
TEI: Target or Match?¶21.1.2.4 Target or Match?
As noted in 16 Linking, Segmentation, and Alignment, the target attribute may take any general data.pointer as values and may thus also contain an XPath expression of arbitrary complexity. Because full support for XPath is not provided by current processors, it is not generally recommended TEI practice. There are however some simple cases in which XPath syntax is to be preferred, notably those in which the xml:id attribute is used to identify a single element occurrence. The usage #A (to indicate the element whose xml:id attribute has the value A) is syntactically much simpler than the equivalent xpath2 expression //*[@xml:id='A'] and is hence preferred throughout these guidelines.
For similar reasons, the certainty element may specify both a target value (expressed as an URI) and a match value (expressed as an XPath). The former defines the context within which the latter is to be evaluated. As previously noted, if no value is supplied for target, the context within which the value of match should be evaluated is the parent element of the certainty element itself.
<certainty target="#CE-u1" match="@who"
locus="value" degree="0.5"/>
degree="0.5"/>
</u>
degree="0.2"/>
locus="location" degree="0.2"/>
<certainty match="@resp" locus="value"
degree="0.2"/>
</persName>
degree="0.2"/>
locus="value" degree="0.2"/>
locus="value" degree="0.2"/>
The certainty element and the other TEI mechanisms for indicating uncertainty provide a range of methods of graduated complexity. Simple expressions of uncertainty may be made by using the note element. This is simple and convenient, and can accommodate either a discursive and unstructured indication of uncertainty, or a complex and structured but probably project-specific expression of uncertainty. In general, however, unless special steps are taken, the note element does not provide as much expressive power as the certainty element, and in cases where highly structured certainty information must be given, it is recommended that the certainty element be preferred.
TEI: Indications of Precision¶21.2 Indications of Precision
As noted above, certainty about the accuracy of an encoding or its content is not the same thing as the precision with which a value is specified. In the case of a date or a quantity, for example, we might be certain that the value given is imprecise, or uncertain about whether or not the value given is correct. The latter possibility would be represented by the certainty element discussed in the previous section; the former by the precision element discussed in this section.
The elements concerning which statements of precision are to be made are identified using the same target and match attributes inherited from the att.scoping class discussed in the previous section and in the same way. Other aspects are provided by other attributes as further discussed below.
- precision indicates the numerical accuracy or precision associated with some aspect of the text markup.
degree indicates the degree of precision to be assigned as a value between 0 (none) and 1 (optimally precise) stdDeviation supplies a standard deviation associated with the value in question
scope="all"/>
Suppose however that the precision with which the value of such an attribute can be specified is variable. For example, suppose an event is dated ‘about fifty years after the death of Augustus’. In this case, the precision of one end of the range (the death of Augustus) is higher than the other, assuming we know when Augustus died. We can say that the latest possible date is probably 50 years after that, but with less confidence than we can attach to the earliest possible date.
notAfter="0064">About 50
years after the death of Augustus</date>
<precision target="#d001" match="@notAfter"
degree="0.3"/>
<precision target="#d001"
match="@notBefore" degree="0.9"/>
notAfter="1857-04-30">From the 1st of March to
some time in April of 1857.
<precision match="@notAfter" degree="0.5"/>
</residence>
atMost="30" unit="cm" scope="all"/>
<precision target="#w00t" match="@atMost"
degree="0.3"/>
unit="chars" quantity="62.4"/>
<precision target="#dim1" stdDeviation="4"/>
TEI: Attribution of Responsibility¶21.3 Attribution of Responsibility
In general, attribution of responsibility for the transcription and markup of an electronic text is made by respStmt elements within the header: specifically, within the title statement, the edition statement(s), and the revision history.
In some cases, however, more detailed element-by-element information may be desired. For example, an encoder may wish to distinguish between the individuals responsible for transcribing the content and those responsible for determining that a given word or phrase constitutes a proper noun. Where such fine-grained attribution of responsibility is required, the respons element can be used.
- respons (responsibility) identifies the individual(s) responsible for some aspect of the content or markup of particular element(s).
locus indicates the specific aspect of the encoding (markup or content) for which responsibility is being assigned.
This element allows one or more aspects of the markup to be attributed to a given individual. This element inherits the target and match attributes from the att.scoping class, in the same way as the certainty and precision elements. Its locus attribute functions in the same way as that on the certainty element (see 21.1 Levels of Certainty). It inherits the resp and cert attributes from the att.responsibility class.
<persName xml:id="CE-p5" rend="it">Saybrook</persName>.
<!-- ... -->
<respons target="#CE-p5" locus="value"
resp="#RC"/>
<respons target="#CE-p5"
locus="name location" resp="#PMWR"/>
<list type="encoders">
<item xml:id="PMWR"/>
<item xml:id="RC"/>
</list>
locus="value" resp="#RC"/>
Some elements bear specialized resp or agent attributes, which have specific meanings that vary from element to element; the respons element should be reserved for the general aspects of responsibility common to all text transcription and markup, and should not be confused with the more specific attributes on individual elements.
TEI: The Certainty Module¶21.4 The Certainty Module
The module described in this chapter makes available the following additional elements:
The selection and combination of modules to form a TEI schema is described in 1.2 Defining a TEI Schema.