Text Encoding Initiative
8. Cross References and Links
Explicit cross references or links from one point in a text to another in the same SGML document may be encoded using the elements described in section 8.1. Simple Cross References. References or links to elements of some other SGML document, or to parts of non-SGML documents, may be encoded using the TEI extended pointers described in section 8.2. Extended Pointers. Implicit links (such as the association between two parallel texts, or that between a text and its interpretation) may be encoded using the linking attributes discussed in section 8.3. Linking Attributes.
A cross reference from one point within a single document to another can be encoded using either of the following elements:
These elements share the following attributes:
The difference between these two elements is that <ptr> is an empty element, simply marking a point from which a link is to be made, whereas <ref> may contain some text as well — typically the text of the cross-reference itself. The <ptr> element would be used for a cross reference which is to be indicated by some non-verbal means such as a symbol or icon, or in an electronic text by a button. It is also useful in document production systems, where the formatter can generate the correct verbal form of the cross reference.
See especially <ref target="SEC12">section 12 on page 34</ref>.
See especially <ptr target="SEC12"/>.The value of the target attribute must have been used as the identifier of some other element within the current document. This implies that the passage or phrase being pointed at must bear an identifier, and must therefore be tagged as an element of some kind. In the following example, the cross reference is to a <div1> element:
... see especially <ptr target="SEC12"/>. ... <div1 id="SEC12"><head>Concerning Identifiers... ...
... this is discussed in <ref target="pspec">the paragraph on links</ref> ... <p id="pspec">Links may be made to any kind of element ...
... this is discussed in <ref target="dspec" targType="div1 div2"> the section on links</ref>
This reference should fail if the element with identifier dspec is neither a <div1> nor a <div2>. Note however that this additional check cannot be carried out by an SGML or XML parser alone, since such parsers can only check that some element dspec exists.
The type attribute can be used to categorize the link represented by the pointer in any convenient way. The resp and crDate attributes may also be used to represent the person or agency responsible for making the link, and its date of creation, as in the following example:
... this is discussed in <ref type="xref" resp="auto" crdate="950521" target="dspec" targType='div1 div2"> the section on links</ref>These attributes are most likely to be of use in hypertext systems containing very many pointers used for a variety of purposes and created by a variety of means.
Sometimes the target of a cross reference does not correspond with any particular feature of a text, and so may not be tagged as an element of some kind. If the desired target is simply a point in the current document, the easiest way to mark it is by introducing an <anchor> element at the appropriate spot. If the target is some sequence of words not otherwise tagged, the <seg> element may be introduced to mark them. These two elements are described as follows:
In the following (imaginary) example, <ref> elements have been used to represent points in this text which are to be linked in some way to other parts of it; in the first case to a point, and in the second, to a sequence of words:
Returning to <ref target="ABCD">the point where I dozed off</ref>, I noticed that <ref target="EFGH">three words</ref> had been circled in red by a previous reader
This encoding requires that elements with the specified identifiers (ABCD and EFGH in this example) are to be found somewhere else in the current document. Assuming that no element already exists to carry these identifiers, the <anchor> and <seg> elements may be used:
.... <anchor type="bookmark" id="ABCD"/> .... ....<seg type="target" id="EFGH"> ... </seg> ...
The type attribute should be used (as above) to distinguish amongst different purposes for which these general purpose elements might be used in a text. Some other uses are discussed in section 8.3. Linking Attributes below.
The elements <ptr> and <ref> can only be used for cross-references or links whose targets occur within the same document as their source. They can also refer only to elements explicitly tagged in the document. The elements discussed in this section are not restricted in this way.
In addition to the pointer attributes already discussed in section 8.1. Simple Cross References above, these elements share the following additional attributes, which are used to specify the target of the cross reference or link in place of the target attribute:
A full specification of the language used to express the target of TEI extended pointers is beyond the scope of this document; here we list here only a few of its more generally useful features. The full Guidelines should be consulted for more detail.
see <xref doc="P3">The TEI Guidelines, passim</xref>
This example assumes that some system or public entity with the name P3 has been declared. This declaration has to be included within the DTD in force when the document is parsed; the manner of doing so is specific to the authoring software in use (as further discussed in section 15. Figures and Graphics).
The from attribute is used to specify some location within whatever document is specified by the doc attribute. The specification uses a special language, called the TEI extended pointer syntax; only some details of which are given here. In this language, locations are defined as a series of steps, each one identifying some part of the document, often in terms of the locations identified by the previous step. For example, you would point to the third sentence of the second paragraph of chapter two by selecting chapter two in the first step, the second paragraph in the second step, and the third sentence in the last step. A step can be defined in terms of the document tree itself, using such concepts as parent, descendent, preceding, etc. or, more loosely, in terms of text patterns, word or character positions. You can also use a foreign (non-SGML) notation, or specify a location within a graphic in terms of its co-ordinate system.
The from and to attributes use the same notation. Each points to some portion of the target document; the extended pointer as a whole points to the section beginning at the start of the from and running to the end of the to.
The first step in a location path will often be to specify the identifier of some element within the target document, as in this example:
<xptr doc="P3" from="id (SA)"/>This selects the whole of whatever element bears the identifier SA within the entity P3. If a finer-grained target is required, other steps might follow. The following keywords are available for you to select other elements in terms of their relationship to this one:
Each of these keywords implies a particular set of elements (the set of children, the set of ancestors, the set of previous siblings, etc.); to specify which element in the set we are pointing at, the keyword may optionally be followed by a parenthesized list containing:
<xptr doc="P3" from="id (SA) child (3 p)"/>
Similarly, assuming that the entity P3 is in fact a reference to the XML form of the TEI Guidelines, then the following reference will select section 14.2.2 of that publication in which (as it happens) the extended pointer syntax is formally defined:
For full details, see <ref doc="P3" from="id (SA) child (2 div2) child (2 div3)"> TEI Extended pointer syntax definition </ref>
Normally, the scope of a cross reference will be adequately defined by the from attribute. For some documents, however, it may be more convenient to define both a starting and an ending scope. As noted above, the to attribute is provided for this purpose. For example,
<xptr doc="P1" from="id (xyz)" to="id (abc)"/>is an extended pointer whose target is the sequence starting at the beginning of whatever element in document P1 has identifier XYZ and ending at the end of whatever element in the same document has identifier ABC. Any elements in between are also included, irrespective of structure; the pointer is erroneous if the end of ABC precedes the start of XYZ.
Very complex specifications are easily built using this syntax. For example, the following reference will select the most recent <head> element which carries an attribute lang with the value LAT, and which occurs before the start of the element with identifier SA:
<xptr doc="P3" from="id (SA) preceding (1 head lang lat)"/>
If no value is supplied for the doc attribute, the current document is assumed. Thus, the following references are semantically equivalent. They both indicate the element with identifier X1 within the current document:
<ptr target="X1"/> <xptr from="id (X1)"/>
The TEI Extended Pointer Syntax was defined before the more recent XLink specifications, which are however to some extent derived from them. Work is currently going on to harmonize the two specification languages.
The following special purpose linking attributes are defined for every element in the TEI Lite DTD:
The ana (analysis) attribute is intended for use where a set of abstract analyses or interpretations have been defined somewhere within a document, as further discussed in section 16. Interpretation and Analysis. For example, a linguistic analysis of the sentence ‘John loves Nancy’ might be encoded as follows:
<seg type="sentence" ana="SVO"> <seg type="lex" ana="NP1">John</seg> <seg type="lex" ana="VVI">loves</seg> <seg type="lex" ana="NP1">Nancy</seg> </seg>This encoding implies the existence elsewhere in the document of elements with identifiers SVO, NP1, and VV1 where the significance of these particular codes is explained. Note the use of the <seg> element to mark particular components of the analysis, distinguished by the type attribute.
The corresp (corresponding) attribute provides a simple way of representing some form of correspondence between two elements in a text. For example, in a multilingual text, it may be used to link translation equivalents, as in the following example
<seg lang="FRA" id="FR1" corresp="EN1">Jean aime Nancy</seg> <seg lang="ENG" id="EN1" corresp="FR1">John loves Nancy</seg>
The same mechanism may be used for a variety of purposes. In the following example, it has been used to represent anaphoric correspondences between ‘the show’ and ‘Shirley’, and between ‘NBC’ and ‘the network’:
<p><title id="shirley">Shirley</title>, which made its Friday night debut only a month ago, was not listed on <name id="nbc">NBC</name>'s new schedule, although <seg id="network" corresp="nbc">the network</seg> says <seg id="show" corresp="shirley">the show</seg> still is being considered.</p>
<q id="Q1a" next="Q1b">Who-e debel you?</q> &mdash he at last said &mdash <q id="Q1b" prev="Q1a">you no speak-e, damme, I kill-e.</q> And so saying, the lighted tomahawk began flourishing about me in the dark.