Text Encoding Initiative

8. Cross References and Links


Explicit cross references or links from one point in a text to another in the same SGML document may be encoded using the elements described in section 8.1. Simple Cross References. References or links to elements of some other SGML document, or to parts of non-SGML documents, may be encoded using the TEI extended pointers described in section 8.2. Extended Pointers. Implicit links (such as the association between two parallel texts, or that between a text and its interpretation) may be encoded using the linking attributes discussed in section 8.3. Linking Attributes.

8.1. Simple Cross References

A cross reference from one point within a single document to another can be encoded using either of the following elements:

<ref>
a reference to another location in the current document, in terms of one or more identifiable elements, possibly modified by additional text or comment.
<ptr>
a pointer to another location in the current document in terms of one or more identifiable elements.

These elements share the following attributes:

target
specifies the destination of the pointer as one or more SGML identifiers
type
categorizes the pointer in some respect, using any convenient set of categories.
targType
specifies the type (or types) of element to which this pointer may point.
crDate
specifies when this pointer was made.
resp
specifies the creator of the pointer.

The difference between these two elements is that <ptr> is an empty element, simply marking a point from which a link is to be made, whereas <ref> may contain some text as well — typically the text of the cross-reference itself. The <ptr> element would be used for a cross reference which is to be indicated by some non-verbal means such as a symbol or icon, or in an electronic text by a button. It is also useful in document production systems, where the formatter can generate the correct verbal form of the cross reference.

The following two forms, for example, are logically equivalent (assuming we have documented somewhere the exact verbal form of cross references represented by <ptr> elements):

See especially <ref target="SEC12">section 12 on page
       34</ref>.
See especially <ptr
       target="SEC12"/>.
The value of the target attribute must have been used as the identifier of some other element within the current document. This implies that the passage or phrase being pointed at must bear an identifier, and must therefore be tagged as an element of some kind. In the following example, the cross reference is to a <div1> element:
    ...
    see especially <ptr target="SEC12"/>.
    ...
    <div1 id="SEC12"><head>Concerning Identifiers...
    ...

Because the id attribute is global, any element in a document may be pointed to in this way. In the following example, a paragraph has been given an identifier so that it may be pointed at:

    ...
    this is discussed in <ref target="pspec">the paragraph on links</ref>
    ...
    <p id="pspec">Links may be made to any kind of element
    ...

The targType attribute can be used to specify that the element pointed to must be of a particular type, as in the following example:

    ...
    this is discussed in <ref target="dspec" targType="div1 div2">
    the section on links</ref>

This reference should fail if the element with identifier dspec is neither a <div1> nor a <div2>. Note however that this additional check cannot be carried out by an SGML or XML parser alone, since such parsers can only check that some element dspec exists.

The type attribute can be used to categorize the link represented by the pointer in any convenient way. The resp and crDate attributes may also be used to represent the person or agency responsible for making the link, and its date of creation, as in the following example:

    ...
   this is discussed in
   <ref type="xref" resp="auto" crdate="950521" target="dspec" targType='div1 div2">
   the section on links</ref>
These attributes are most likely to be of use in hypertext systems containing very many pointers used for a variety of purposes and created by a variety of means.

Sometimes the target of a cross reference does not correspond with any particular feature of a text, and so may not be tagged as an element of some kind. If the desired target is simply a point in the current document, the easiest way to mark it is by introducing an <anchor> element at the appropriate spot. If the target is some sequence of words not otherwise tagged, the <seg> element may be introduced to mark them. These two elements are described as follows:

<anchor>
specifies a location or point within a document so that it may be pointed to.
<seg>
identifies a span or segment of text within a document so that it may be pointed to. Attributes include

type
categorizes the segment

In the following (imaginary) example, <ref> elements have been used to represent points in this text which are to be linked in some way to other parts of it; in the first case to a point, and in the second, to a sequence of words:

  Returning to <ref target="ABCD">the point where I dozed
  off</ref>, I noticed that <ref target="EFGH">three
  words</ref> had been circled in red by a previous reader

This encoding requires that elements with the specified identifiers (ABCD and EFGH in this example) are to be found somewhere else in the current document. Assuming that no element already exists to carry these identifiers, the <anchor> and <seg> elements may be used:

  .... <anchor type="bookmark" id="ABCD"/> ....
   ....<seg type="target" id="EFGH"> ... </seg> ...

The type attribute should be used (as above) to distinguish amongst different purposes for which these general purpose elements might be used in a text. Some other uses are discussed in section 8.3. Linking Attributes below.

8.2. Extended Pointers

The elements <ptr> and <ref> can only be used for cross-references or links whose targets occur within the same document as their source. They can also refer only to elements explicitly tagged in the document. The elements discussed in this section are not restricted in this way.

<xptr>
defines a pointer to another location in the current document or an external document.
<xref>
defines a pointer to another location in the current document or an external document, possibly modified by additional text or comment.

In addition to the pointer attributes already discussed in section 8.1. Simple Cross References above, these elements share the following additional attributes, which are used to specify the target of the cross reference or link in place of the target attribute:

doc
specifies the document within which the required location is to be found, by default the current document.
from
specifies the start of the destination of the pointer as an expression in the TEI extended pointer syntax, by default the whole of the document indicated by the doc attribute.
to
specifies the endpoint of the destination of the pointer as an expression in the TEI extended pointer syntax; may only be specified if the from attribute has been.

A full specification of the language used to express the target of TEI extended pointers is beyond the scope of this document; here we list here only a few of its more generally useful features. The full Guidelines should be consulted for more detail.

An <xptr> (or <xref>) may point to the whole of some other document simply by supplying an entity name as the value of the doc attribute, as in this example:

  see <xref doc="P3">The TEI Guidelines, passim</xref>

This example assumes that some system or public entity with the name P3 has been declared. This declaration has to be included within the DTD in force when the document is parsed; the manner of doing so is specific to the authoring software in use (as further discussed in section 15. Figures and Graphics).

The from attribute is used to specify some location within whatever document is specified by the doc attribute. The specification uses a special language, called the TEI extended pointer syntax; only some details of which are given here. In this language, locations are defined as a series of steps, each one identifying some part of the document, often in terms of the locations identified by the previous step. For example, you would point to the third sentence of the second paragraph of chapter two by selecting chapter two in the first step, the second paragraph in the second step, and the third sentence in the last step. A step can be defined in terms of the document tree itself, using such concepts as parent, descendent, preceding, etc. or, more loosely, in terms of text patterns, word or character positions. You can also use a foreign (non-SGML) notation, or specify a location within a graphic in terms of its co-ordinate system.

The from and to attributes use the same notation. Each points to some portion of the target document; the extended pointer as a whole points to the section beginning at the start of the from and running to the end of the to.

The first step in a location path will often be to specify the identifier of some element within the target document, as in this example:

<xptr doc="P3" from="id (SA)"/>
This selects the whole of whatever element bears the identifier SA within the entity P3. If a finer-grained target is required, other steps might follow. The following keywords are available for you to select other elements in terms of their relationship to this one:

child
elements contained by this one.
ancestor
elements which contains this one, directly or indirectly.
previous
elements with the same parent as this one but preceding it in the document.
next
elements with the same parent as this one and following it in the document.
preceding
elements in the document which start before this one does, irrespective of their parents.
following
elements in the document which start after this one does, irrespective of their parents.

Each of these keywords implies a particular set of elements (the set of children, the set of ancestors, the set of previous siblings, etc.); to specify which element in the set we are pointing at, the keyword may optionally be followed by a parenthesized list containing:

Continuing the above example, the following reference will select the third <p> element directly contained by whatever element has the identifier SA:

<xptr doc="P3" from="id (SA) child (3 p)"/>

Similarly, assuming that the entity P3 is in fact a reference to the XML form of the TEI Guidelines, then the following reference will select section 14.2.2 of that publication in which (as it happens) the extended pointer syntax is formally defined:

For full details, see
<ref doc="P3" from="id (SA) child (2 div2) child (2 div3)">
  TEI Extended pointer syntax definition
</ref>

Normally, the scope of a cross reference will be adequately defined by the from attribute. For some documents, however, it may be more convenient to define both a starting and an ending scope. As noted above, the to attribute is provided for this purpose. For example,

  <xptr doc="P1" from="id (xyz)" to="id (abc)"/>
is an extended pointer whose target is the sequence starting at the beginning of whatever element in document P1 has identifier XYZ and ending at the end of whatever element in the same document has identifier ABC. Any elements in between are also included, irrespective of structure; the pointer is erroneous if the end of ABC precedes the start of XYZ.

Very complex specifications are easily built using this syntax. For example, the following reference will select the most recent <head> element which carries an attribute lang with the value LAT, and which occurs before the start of the element with identifier SA:

<xptr doc="P3" from="id (SA) preceding (1 head lang lat)"/>

If no value is supplied for the doc attribute, the current document is assumed. Thus, the following references are semantically equivalent. They both indicate the element with identifier X1 within the current document:

<ptr target="X1"/>
<xptr from="id (X1)"/>

The TEI Extended Pointer Syntax was defined before the more recent XLink specifications, which are however to some extent derived from them. Work is currently going on to harmonize the two specification languages.

8.3. Linking Attributes

The following special purpose linking attributes are defined for every element in the TEI Lite DTD:

ana
links an element with its interpretation.
corresp
links an element with one or more other corresponding elements.
next
links an element to the next element in an aggregate.
prev
links an element to the previous element in an aggregate.

The ana (analysis) attribute is intended for use where a set of abstract analyses or interpretations have been defined somewhere within a document, as further discussed in section 16. Interpretation and Analysis. For example, a linguistic analysis of the sentence ‘John loves Nancy’ might be encoded as follows:

<seg type="sentence" ana="SVO">
  <seg type="lex" ana="NP1">John</seg>
  <seg type="lex" ana="VVI">loves</seg>
  <seg type="lex" ana="NP1">Nancy</seg>
</seg>
This encoding implies the existence elsewhere in the document of elements with identifiers SVO, NP1, and VV1 where the significance of these particular codes is explained. Note the use of the <seg> element to mark particular components of the analysis, distinguished by the type attribute.

The corresp (corresponding) attribute provides a simple way of representing some form of correspondence between two elements in a text. For example, in a multilingual text, it may be used to link translation equivalents, as in the following example

<seg lang="FRA" id="FR1" corresp="EN1">Jean aime Nancy</seg>
<seg lang="ENG" id="EN1" corresp="FR1">John loves Nancy</seg>

The same mechanism may be used for a variety of purposes. In the following example, it has been used to represent anaphoric correspondences between ‘the show’ and ‘Shirley’, and between ‘NBC’ and ‘the network’:

<p><title id="shirley">Shirley</title>, which made
its Friday night debut only a month ago, was
not listed on <name id="nbc">NBC</name>'s new schedule,
although <seg id="network" corresp="nbc">the network</seg>
says <seg id="show" corresp="shirley">the show</seg>
still is being considered.</p>

The next and prev attributes provide a simple way of linking together the components of a discontinuous element, as in the following example:

<q id="Q1a" next="Q1b">Who-e debel you?</q>
&mdash he at last said &mdash
<q id="Q1b" prev="Q1a">you no speak-e,
damme, I kill-e.</q>  And so saying,
the lighted tomahawk began flourishing
about me in the dark.

Up: Contents Previous: 7. Notes Next: 9. Editorial Interventions



Date: (revised October 2004) Author: Lou Burnard (revised SPQR).
Copyright TEI 1995