11 Representation of Primary Sources

This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials. Section 11.1 Digital Facsimiles provides elements for the encoding of digital facsimiles or images of such materials, while the remainder of the chapter discusses ways of encoding detailed transcriptions of such materials. It is expected that this module will also be useful in the preparation of critical editions, but the module defined here is distinct from that defined in chapter 12 Critical Apparatus, and may be used independently of it. Detailed metadata relating to primary sources of any kind may be recorded using the elements defined by the manuscript description module discussed in chapter 10 Manuscript Description, but again the present module may be used independently if such data is not required.

It should be noted that, as elsewhere in these Guidelines, this chapter places more emphasis on the problems of representing the textual components of a document than on those relating to the description of the document's physical characteristics such as the carrier medium or physical construction. These aspects, of particular importance in codicology and the bibliographic study of incunables, are touched on in the chapter on Manuscript Description (10 Manuscript Description) and also form the subject of ongoing work in the TEI Physical Bibliography workgroup.

Although this chapter discusses manuscript materials more frequently than other forms of written text, most of the recommendations presented are equally applicable mutatis mutandis in the encoding of printed matter or indeed any form of written source, including monumental inscriptions. Similarly, where in the following descriptions terms such as ‘scribe’, ‘author’, ‘editor’, ‘annotator’ or ‘corrector’ are used, these may be re-interpreted in terms more appropriate to the medium being transcribed. In printed material, for example, the ‘compositor’ plays a role analogous to the ‘scribe’, while in an authorial manuscript, the author and the scribe are the same person.

» 11.2 Scope of Transcriptions
ホーム | 目次

11.1 Digital FacsimilesTEI: Digital Facsimiles¶

These Guidelines are mostly concerned with the preparation of digital texts, in which a pre-existing text is transcribed or otherwise converted into character form, and marked up in XML. However, it is also very common practice to make a different form of ‘digital text’, which is instead composed of digital images of the original source, typically one per page, or other written surface. We call such a resource a digital facsimile. A digital facsimile may, in the simplest case, just consist of a collection of images, with some metadata to identify them and the source materials portrayed. It may sometimes contain a variety of images of the same source pages, for example of different resolutions, or of different kinds. Such a collection may form part of any kind of document, for example a commentary of a codicological or paeleographic nature, where there is a need to align explanatory text with image data. And it may also be complemented by a transcribed or encoded version of the original source, which may be linked to the page images. In this section we present elements designed to support these various possibilities and discuss the associated mechanisms provided by these Guidelines.

When this module is included in a schema, the class att.global is extended to include a new pointer attribute facs:

att.global.facs 要素facsimileに含まれる画像や紙面に関する要素．

facs	(facsimile) 当該要素に対応する要素facsimileにある画像やその部分への参照．

This attribute may be used to associate any element in a transcribed text with an image of it, by means of the usual URI pointing mechanism.

If a digital text contains one image per page or column (or similar unit), and no more complex mapping between text and image is envisaged, then the facs attribute may be used to point directly to a graphic resource:

By convention, this encoding indicates that the image indicated by facs attribute represents the whole of the text following the pb (pagebreak) element, up to the next pb element. Any convenient milestone element (see further 3.10.3 Milestone Elements) could be used in the same way; for example if the images represent individual columns, the cb element might be used. Though simple, this method has some drawbacks. It does not scale well to more complex cases where, for example, the images do not correspond exactly with transcribed pages, or where the intention is to align specific marked up elements with detailed images, or parts of images. And it makes the management of the information about the images more difficult by scattering references to them through the file. Nevertheless, this solution may be adequate for many straightforward ‘digital library’ applications.

The recommended approach to encoding facsimiles is instead to use the facs attribute in conjunction with the elements facsimile, surface, and zone, which are also provided by this module. These elements make it possible to accommodate multiple images of each page, as well as to record arbitrary planar coordinates of textual elements on any kind of written surface and to link such elements with digital facsimile images of them. Typical applications include the provision of full text search in ‘digital facsimile editions’, and ways of annotating graphics, for example so as to identify individuals appearing in a group portraits and link them to data about the person represented.

The following elements are used to represent components of a digital facsimile:

facsimile 転記または符号化されたテキストではなく，画像データ中にある，書記資料の表現を示す．
surface 矩形の座標により，書記の表面を定義する．選択的に，空間や矩形範囲中のひとつ以上の図表表現をまとめる．
zone 要素surfaceにある表面上の矩形範囲を定義する．

The facsimile element is used to represent a digital facsimile. It appears within a TEI document along with, or instead of, the text element introduced in section 4 Default Text Structure. When this module is selected therefore, a legal TEI document may thus comprise any of the following:-

a TEI Header and a text element
a TEI Header and a facsimile element
a TEI Header, a facsimile element, and a text element

Like the text element, a facsimile element may also contain an optional front or back element, used in the same way as described in sections 4.5 Front Matter and 4.7 Back Matter.

In the simplest case, a facsimile just contains a series of graphic elements, each of which identifies an image file:

If desired, the binaryObject element described in 3.9 Graphics and other non-textual components (or any other element from the model.graphicLike class) can be used instead of a graphic.

In this simple case, the four page images are understood to represent the complete facsimile, and are to be read in the sequence given. Suppose, however, that the second page of this particular work is available both as an ordinary photograph and as an infra-red image, or in two different resolutions. The surface element may be used to indicate that there are two image files corresponding with the same area of the work:

The surface element provides a way of indicating that the two images of page2 represent the same physical surface within the source material. A surface might be a sheet of paper or parchment, a face of a monument, a billboard, a membrane of a scroll, or indeed any two-dimensional surface, of any size.

The actual dimensions of the object represented are not documented by the surface element; instead, the surface is located within an abstract coordinate space, which is defined by the following attributes, supplied by the att.coordinated class:

att.coordinated 2次元座標システムによる，場所を示す要素．

ulx	矩形における左上点のX軸の値を示す．
uly	矩形における左上点のY軸の値を示す．
lrx	矩形における右下点のX軸の値を示す．
lry	矩形における右下点のY軸の値を示す．

The same coordinate space is used for a surface and for all of its child elements. ³⁴ It may be most convenient to derive a coordinate space from a digital image of the surface in question such that each pixel in the image corresponds with a whole number of units (typically 1) in the coordinate space. In other cases it may be more convenient to use units such as millimetres; in neither case is any specific mapping to the physical dimensions of the object represented implied.

Each surface can contain one or more zone elements, each of which represents a rectangular region or bounding box defined in terms of the same coordinate space as that of its parent surface element. This provides a unit of analysis which may be used to define any rectangular region of interest, such as a detail or illustration, or some part of the surface which is to be aligned with a particular text element. The att.coordinated attributes listed above are also used to supply the coordinates of a zone.

As we have seen, a surface will usually correspond with the whole of a written surface. A zone, by contrast, defines any arbitrary rectangular area of interest using the same coordinate system. It might be bigger or smaller than its parent surface, or might overlap its boundaries. The only constraint is that it must be defined using the same coordinate system.

When an image of some kind is supplied within either a zone or a surface, the implication is that the whole of the image represents the zone or surface containing it. In the simple case therefore, we might imagine a surface defining a page, within which there is a graphic representing the whole of that page, and a number of zones defining parts of the page, each with its own graphic, each representing a part of the page. If however one of those graphics actually represents an area larger than the page (for example to include a binding or the surface of a desk on which the page rests), then it will be enclosed by a zone with coordinates larger than those of the parent surface.

Note that this mechanism does not provide any way of addressing a non-rectangular area, nor of coping with distortions introduced by perspective or parallax; if this is needed, the more powerful mechanisms provided by the Standard Vector Graphics (SVG) language should be used to define an overlay, as further discussed in 16.4.3 A Three-way Alignment.

For example, consider the following figure:

図 2. Relation between page, surface, and zone

This is an image of a two page spread from a manuscript in the Badische Landesbibliothek, Karlsruhe. We have no information as to the dimensions of the original object, but the low resolution image displayed here contains 500 pixels horizontally and 321 pixels vertically. For convenience, we might map each pixel to one cell of the coordinate space. ³⁵

The coordinates of the surface (that is, the area of the image which represents the written two page spread) can then be specified in terms of this coordinate space, simply by counting pixels in the image. The left corner of the two page spread appears 50 units from the left of the image and 20 units from the top, while the bottom right corner of the spread appears 400 units from the left of the image, and 280 units from the top. We therefore define the written surface within this image as follows:

To describe the whole image, we will also need to define a zone of interest which represents an area larger than this surface. Using the same coordinate system as that defined for the surface, its coordinates are 0,0,500,321. This zone of interest can be defined by a zone element, within which we can place the uncropped graphic:

If desired, the binaryObject element described in 3.9 Graphics and other non-textual components (or any other element from the model.graphicLike class) may be used instead of a graphic element.

The desc element may also be used within either surface or zone to provide some further information about the area being defined. For example, since the image in this example contains two pages, it might be preferable to define two distinct surfaces, one for each page, including its illuminated margins. In this case, each surface must specify a bounding box which encloses the appropriate page, as well as defining the zone for the graphic itself:

<facsimile>
<surface
   ulx="50"
   uly="20"
   lrx="210"
   lry="280">
  <desc>left hand page</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="500"
    lry="321">
   <graphic
     url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
  </zone>
</surface>
<surface
   ulx="240"
   uly="25"
   lrx="400"
   lry="280">
  <desc>right hand page</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="500"
    lry="321">
   <graphic
     url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
  </zone>
</surface>
</facsimile>

In addition to acting as a container for graphic elements, zone elements may also be used to select parts of each surface for analytical purposes. For example, to define the written part of the left hand page:

<facsimile>
<surface
   ulx="50"
   uly="20"
   lrx="210"
   lry="280">
  <desc>Left hand page</desc>
  <zone
    ulx="0"
    uly="0"
    lrx="500"
    lry="321">
   <graphic
     url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
  </zone>
  <zone
    ulx="90"
    uly="40"
    lrx="200"
    lry="225">
   <desc>Written part of left hand page</desc>
  </zone>
</surface>
</facsimile>

In the following example, we discuss a hypothetical digital edition of an early 16th century French work, Charles de Bovelles' Géometrie Pratique. ³⁶ In this edition, each page has been digitized as a separate file: for example, recto page 49 is stored in a file called Bovelles-49r.png. In the facsimile element used to contain the whole set of pages, we define a surface element for this page, which we situate within a coordinate scale running from 0 to 200 in the x (horizontal) axis, and 0 to 300 in the y (vertical) axis. The surface element contains a graphic element which represents the whole of this surface:

We can now identify distinct zones within the page image using the coordinate scale defined for the surface. In 図 3, Zones within a surface we show the upper part of the page, with boxes indicating four such zones. Each of these will be represented by a zone element, given within the surface element already defined, and specified in terms of the same coordinate system.

図 3. Zones within a surface

The following encoding defines each of the four zones identified in the figure.

<facsimile>
<surface
   ulx="0"
   uly="0"
   lrx="200"
   lry="300">
  <graphic url="Bovelles-49r.png"/>
  <zone
    ulx="25"
    uly="25"
    lrx="180"
    lry="60">
   <desc>contains the title</desc>
  </zone>
  <zone
    ulx="28"
    uly="75"
    lrx="175"
    lry="178"/>

  <zone
    ulx="105"
    uly="76"
    lrx="175"
    lry="160"/>

  <zone
    ulx="45"
    uly="125"
    lrx="60"
    lry="130"/>

</surface>
</facsimile>

Note that the location of each zone is defined independently but using the same coordinate system, so that they may overlap freely. Zones need not nest within each other; they must however be rectangular, as previously noted. As noted earlier, a zone may fall outside the area of the surface which defines its coordinate space.

In this example a single graphic element has been associated directly with the surface of the page rather than nesting it within a zone. However, it is also possible to include multiple zone elements which contain a graphic element, if for example a detailed image is available. Since all zone elements use the same coordinate system (that defined by their parent surface), there is no need to demonstrate enclosure of one zone within another by means of nesting. To continue the current example, supposing that we have an additional image called Bovelles49r-detail.png containing an additional image of the figure in the third zone above, we might encode that zone as follows:

Now suppose that we wish to align a transcription of this page with the zones identified above. The first step is to give each relevant part of the facsimile an identifier:

The alignment between transcription and image is made, as usual, by means of the facs attribute:

<pb facs="#B49r"/>
<fw>De Geometrie 49</fw>
<head facs="#B49rHead">DU SON ET ACCORD DES CLOCHES ET <lb/> des alleures des chevaulx,
chariotz & charges, des fontaines:& <lb/> encyclie du monde,
& de la dimension du corps humain.</head>
<head>Chapitre septiesme</head>
<div n="1">
<p>Le son & accord des cloches pendans en ung mesme <lb/> axe, est
   faict en contraires parties.</p>
<p rend="it" facs="#B49rPara2">LEs cloches ont quasi fi<lb/>gures de rondes
   pyra<lb/>mides imperfaictes & <lb/> irregulieres: & leur
   accord se <lb/> fait par reigle geometrique. Com<lb/>me si les deux
   cloches C & D <lb/> sont <w facs="#B49rW457">pendans</w> à ung
   mesme axe <lb/> ou essieu A B: je dis que leur ac<lb/>cord se fera en
   co<ex>n</ex>traires parties<lb/> co<ex>m</ex>me voyez icy
   figuré. Car qua<ex>n</ex>d <lb/> lune sera en hault, laultre
   declinera embas. Aultrement si elles decli<lb/>nent toutes deux
   ensembles en une mesme partie, elles seront discord, <lb/> & sera
   leur sonnerie mal plaisante à oyr.<figure facs="#B49rFig1">
   <graphic url="Bovelles49r-detail.png"/>
  </figure>
</p>
</div>

Further discussion of the encoding choices made in the above transcription is provided in the remainder of this chapter.

It is also possible to point in the other direction, from a surface or zone to the corresponding text. This is the function of the start attribute, which supplies the identifier of the element containing the transcribed text found within the surface or zone concerned. Thus, another way of linking this page with its transcription would be simply

<facsimile>
<surface start="#PB49R">
<graphic url="Bovelles-49r.png"/>
</surface>
</facsimile>
<text>

<pb xml:id="PB49R"/>
<fw>De Geometrie 49</fw>

</text>

11.2 Scope of TranscriptionsTEI: Scope of Transcriptions¶

When transcribing a primary source, scholars may wish to record information concerning individual readings of letters, words, or larger units, whether the object is simply a ‘neutral’ transcription or a critical edition. In either case they may also wish to include other editorial material, such as comments on the status or possible origin of particular readings, corrections, or text supplied to fill lacunae. Further, it is customary in transcriptions to register certain features of the source, such as ornamentation, underlining, deletion, areas of damage and lacunae. This chapter provides ways of encoding such information:

first, methods of recording editorial or other alterations to the text, such as expansion of abbreviations, corrections, conjectures, etc. (section 11.3 Altered, Corrected, and Erroneous Texts)
then, methods of describing important extra-linguistic phenomena in the source: unusual spaces, lines, page and line breaks, change of manuscript hand, etc. (section 11.4 Hands and Responsibility)
finally, a method of recording material such as running heads, catch-words, and the like (section 11.7 Headers, Footers, and Similar Matter)

These recommendations are not intended to meet every transcriptional circumstance likely to be faced by any scholar. Rather, they should be regarded as a base which can be elaborated if necessary by different scholars in different disciplines.

As a rule, all elements which may be used in the course of a transcription of a single witness may also be used in a critical apparatus, i.e. within the elements proposed in chapter 12 Critical Apparatus. This can generally be achieved by nesting a particular reading containing tagged elements from a particular witness within the rdg element in an app structure.

Just as a critical apparatus may contain transcriptional elements within its record of variant readings in various witnesses, one may record variant readings in an individual witness by use of the apparatus mechanisms app and rdg. This is discussed in section 12.3 Using Apparatus Elements in Transcriptions.

11.3 Altered, Corrected, and Erroneous TextsTEI: Altered, Corrected, and Erroneous Texts¶

In the detailed transcription of any source, it may prove necessary to record various types of actual or potential alteration of the text: expansion of abbreviations, correction of the text (either by author, scribe, or later hand, or by previous or current editors or scholars), addition, deletion, or substitution of material, and the like. The sections below describe how such phenomena may be encoded using either elements defined in the core module (defined in chapter 3 Elements Available in All TEI Documents) or specialized elements available only when the module described in this chapter is available.

11.3.1 Core elements for Transcriptional WorkTEI: Core elements for Transcriptional Work¶

In transcribing individual sources of any type, encoders may record corrections, normalizations, expansions of abbreviations, additions, and omissions using the elements described in section 3.4 Simple Editorial Changes. Those particularly relevant to this chapter include:

abbr 名称の省略
add 著者，筆写者，注釈者，校正者による，文字，単語，句レベルでのテキスト挿入を示す．
choice テキスト中の同じ場所で，異なる符号化記述をまとめる．
corr 元資料中の明らかな間違いを正したものを示す．
del 著者・筆写者・注釈者・校正者により，削除または削除として符号化または余分なものまたは間違いとして示されている，文字，単語，句を示す．
expan 省略形の元の表現を示す．
gap (gap) TEIヘダーにある編集上の理由，または当該資料が判読できない・聞こえないことを理由に，転記の際に省略された部分の場所を示す．
sic (latin for thus or so ) 明らかに間違い，不正確ではあるが，そのまま収録してあるテキスト．

Several of these elements bear additional attributes for specifying who is responsible for the interpretation represented by the markup, and the certainty associated with it. In addition, some of them bear an attribute allowing the markup to be categorised by type and source.

att.editLike 学術的調整・解釈の性質を表す属性を示す．

evidence	当該解釈や調整の信頼度や正確さを判断する証拠を示す．
source	当該読みの元になる資料を示す，ひとつ以上のポインタ．

att.responsibility provides attributes indicating who is responsible for something asserted by the markup and the degree of certainty associated with it.

cert	(certainty) 当該解釈や調整の確信度を示す．
resp	(responsible party) 当該解釈や調整の責任者を示す．例えば，編集者，翻訳者など．

att.typed 要素を分類するための属性を示す．

type	当該要素の分類を示す．
subtype	必要であれば，当該要素の下位分類を示す．

The specific aspect of the markup described by these attributes differs on different elements; for further discussion, see the relevant sections below, especially section 11.4.2 Hand, Responsibility, and Certainty Attributes.

The following sections describe how the core elements just named may be used in the transcription of primary source materials.

11.3.2 Abbreviation and ExpansionTEI: Abbreviation and Expansion¶

The writing of manuscripts by hand lends itself to the use of abbreviation to shorten scribal labour. Commonly occurring letters, groups of letters, words, or even whole phrases, may be represented by significant marks. This phenomenon of manuscript abbreviation is so widespread and so various that no taxonomy of it is here attempted. Instead, methods are shown which allow abbreviations to be encoded using the core elements mentioned above.

A manuscript abbreviation may be viewed in two ways. One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’. One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n’. Both of these views are supported by these Guidelines.

In many cases the glyph found in the manuscript source also exists in the Unicode character set: for example the common Latin brevigraph ⁊, standing for et and often known as the ‘Tironian et’ can be directly represented in any XML document as the Unicode character with code point U+204A (see further Character References and vi.1. Language identification). In cases where it does not, these Guidelines recommend use of the g element provided by the gaiji module described in chapter 5 Representation of Non-standard Characters and Glyphs. This module allows the encoder great flexibility both in processing and in documenting non-standard characters or glyphs, including the ability to provide detailed documentation and images for them.

These two methods of coding abbreviation may also be combined. An encoder may record, for any abbreviation, both the sequence of letters or marks which constitutes it, and its sense, that is, the letter or letters for which it is believed to stand. For example, in the following fragment the phrase euery persone is represented by a sequence of characters which may be transcribed directly, using the g element to indicate the two brevigraphs it contains as follows:

eu<g ref="#b-er">er</g>y <g ref="#b-per">per</g>sone that loketh after heuen hath a place in this
ladder


<charDecl>
<char xml:id="b-er">

</char>
<char xml:id="b-per">

</char>
</charDecl>

seq	(sequence) 当該属性が示す素性が出現すると想定されている順番の，番号を示す．
status	当該調整の影響を示す．例えば，削除の際，取消線の範囲が多すぎたり少なすぎたりする場合や，追加の際，既にあるテキストの部分をコピーして挿入したりする場合．
hand	当該調整を行った主体の筆致を特定する．

reason	省略の理由を示す．例えば，見本, 聞こえない, 無関係, 取り消し, 取り消しがありかつ判読できない，など．
hand	特定可能な筆致による熟慮した削除の場合，転記の際にその筆致を示す．
agent	損傷が原因のテキスト省略の場合，特定可能であれば当該損傷を分類する．

scribe	当該筆致に対応すると十分に信じられる筆写者の一般的な名前または識別子を示す．
script	当該筆致で使用されている特定の筆体や書記スタイルの特徴を示す．例えば，secretary(書記官スタイル),　 copperplate(銅板スタイル),　 Chancery(公文書スタイル), Italian(イタリアスタイル)など．
medium	インクの種類や色合い，例えば，茶色や，筆記具の種類，例えば，鉛筆など．
scope	当該筆致が，当該手書き資料中で，どの程度出現しているかを示す．

hand	当該損傷部分の書き手が特定できる場合，それを示す．当該損傷部分の書き手が特定できる場合，それを示す．
agent	当該損傷の原因の分類を示す．当該損傷の原因の分類を示す．
degree	当該損傷部分の程度を示す．要素damageの属性 degreeは，当該損傷部分のテキストが確認できる場合にのみ使用されるべきである．他の資料から補われたテキストの場合には，要素suppliedで示されるべきである．当該損傷部分の程度を示す．要素damageの属性 degreeは，当該損傷部分のテキストが確認できる場合にのみ使用されるべきである．他の資料から補われたテキストの場合には，要素suppliedで示されるべきである．
group	各損傷部分に，物理的状況を示す，任意の数値を付与する．各損傷部分に，物理的状況を示す，任意の数値を付与する．

extent	indicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words.
unit	当該大きさの単位を示す．
quantity	当該単位の大きさを示す．

min	where the measurement summarizes more than one observation or a range, supplies the minimum value observed.
max	where the measurement summarizes more than one observation or a range, supplies the maximum value observed.
atLeast	gives a minimum estimated value for the approximate measurement.
atMost	gives a maximum estimated value for the approximate measurement.

P5: TEIガイドライン