3 Elements Available in All TEI Documents

內容

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph). A few of the elements described in this chapter (for example, bibliographic citations and lists) have a comparatively well-defined internal structure, but most of them have no consistent inner structure of their own. In the general case, they contain only a few words, and are often identifiable in a conventionally printed text by the use of typographic conventions such as shifts of font, use of quotation or other punctuation marks, or other changes in layout.

This chapter begins by describing the p tag used to mark paragraphs, the prototypical formal unit for running text in many TEI modules. This is followed, in section 3.2 Treatment of Punctuation, by a discussion of some specific problems associated with the interpretation of conventional punctuation, and the methods proposed by the Guidelines for resolving ambiguities therein.

The next section (section 3.3 Highlighting and Quotation) describes a number of phrase-level elements commonly marked by typographic features (and thus well-represented in conventional markup languages). These include features commonly marked by font shifts (section 3.3.2 Emphasis, Foreign Words, and Unusual Language) and features commonly marked by quotation marks (section 3.3.3 Quotation) as well as such features as terms, cited words, and glosses (section 3.3.4 Terms, Glosses, Equivalents, and Descriptions).

Section 3.4 Simple Editorial Changes introduces some phrase-level elements which may be used to record simple editorial interventions, such as emendation or correction of the encoded text. The elements described here constitute a simple subset of the full mechanisms for encoding such information (described in full in chapter 11 Representation of Primary Sources), which should be adequate to most commonly encountered situations.

The next section (section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses) describes several phrase-level and inter-level elements which, although often of interest for analysis or processing, are rarely explicitly identified in conventional printing. These include names (section 3.5.1 Referring Strings), numbers and measures (section 3.5.3 Numbers and Measures), dates and times (section 3.5.4 Dates and Times), abbreviations (section 3.5.5 Abbreviations and Their Expansions), and addresses (section 3.5.2 Addresses).

In the same way, the following section (section 3.6 Simple Links and Cross-References) presents only a subset of the facilities available for the encoding of cross-references or text-linkage. The full story may be found in chapter 16 Linking, Segmentation, and Alignment; the tags presented here are intended to be usable for a wide variety of simple applications.

Sections 3.7 Lists, and 3.8 Notes, Annotation, and Indexing, describe two kinds of quasi-structural elements: lists and notes. These may appear either within chunk-level elements such as paragraphs, or between them. Several kinds of lists are catered for, of an arbitrary complexity. The section on notes discusses both notes found in the source and simple mechanisms for adding annotations of an interpretive nature during the encoding; again, only a subset of the facilities described in full elsewhere (specifically, in chapter 17 Simple Analytic Mechanisms) is discussed.

Section 3.9 Graphics and other non-textual components introduces some simple ways of representing graphic or other non-textual content found in a text. A fuller discussion of the multimedia facilities supported by these Guidelines may be found in chapters 14 Tables, Formulæ, Graphics and Notated Music and 16 Linking, Segmentation, and Alignment.

Next, section 3.10 Reference Systems, describes methods of encoding within a text the conventional system or systems used when making references to the text. Some reference systems have attained canonical authority and must be recorded to make the text useable in normal work; in other cases, a convenient reference system must be created by the creator or analyst of an electronic text.

Like lists and notes, the bibliographic citations discussed in section 3.11 Bibliographic Citations and References, may be regarded as structural elements in their own right. A range of possibilities is presented for the encoding of bibliographic citations or references, which may be treated as simple phrases within a running text, or as highly-structured components suitable for inclusion in a bibliographic database.

Additional elements for the encoding of passages of verse or drama (whether prose or verse) are discussed in section 3.12 Passages of Verse or Drama.

The chapter concludes with a technical overview of the structure and organization of the module described here. This should be read in conjunction with chapter 1 The TEI Infrastructure, describing the structure of the TEI document type definition.

3.1 Paragraphs Paragraphs¶

The paragraph is the fundamental organizational unit for all prose texts, being the smallest regular unit into which prose can be divided. Prose can appear in all TEI texts, even those that are primarily of another genre (e.g., verse); thus the paragraph is described here, as an element which can appear in any kind of text.

Paragraphs can contain any of the other elements described within this chapter, as well as some other elements which are specific to individual text types. We distinguish phrase-level elements, which must be entirely contained within a paragraph and cannot appear except within one, from chunks, which can appear between, but not within, paragraphs, and from inter-level elements, which can appear either within a single paragraph or between paragraphs. The class of phrases includes emphasized or quoted phrases, names, dates, etc. The class of inter-level elements includes bibliographic citations, notes, lists, etc. The class of chunks includes the paragraph itself, and other elements which have similar structural properties, notably the ab (anonymous block) element described in 16.3 Blocks, Segments, and Anchors) which may be used as an alternative to the paragraph in some kinds of texts.

Because paragraphs may appear in different base or additional tag sets, their possible contents may differ in different kinds of documents. In particular, additional elements not listed in this chapter may appear in paragraphs in certain kinds of text. However, the elements described in this chapter are always by default available in all kinds of text.

The paragraph is marked using the p element:

p (段落) 標記散文的段落。

If a consistent internal subdivision of paragraphs is desired, the s or seg (‘segment’) elements may be used, as discussed in chapters 16 Linking, Segmentation, and Alignment and 17 Simple Analytic Mechanisms respectively. More usually, however, paragraphs have no firm internal structure, but contain prose encoded as a mix of characters, entity references, phrases marked as described in the rest of this chapter, and embedded elements like lists, figures, or tables.

Since paragraphs are usually explicitly marked in Western texts, typically by indentation, the application of the p tag usually presents few problems.

In some cases, the body of a text may comprise but a single paragraph:

<body>
I fully appreciate Gen. Pope's splendid achievements with their
invaluable results; but you must know that Major Generalships in the
Regular Army, are not as plenty as blackberries.
</body>

direct	may be used to indicate whether the quoted matter is regarded as direct or indirect speech.
aloud	may be used to indicate whether the quoted matter is regarded as having been vocalized or signed.

uri	(uniform resource identifier) 以外部識別符來說明父元素所標記的基本概念
filter	參照能把該元素實例轉變成標準TEI的外部程式
name	說明父元素所標記的基本概念。

cert	(certainty) 表示該更動或詮釋的相關正確度。
resp	(responsible party) 指出負責該更動或詮釋的代理者，例如編輯或轉錄者。

unit	度量單位的名稱。
quantity	specifies the length in the units specified
extent	indicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words.
precision	characterizes the precision of the values specified by the other attributes.
scope	測量多個物件時，標明此度量的可應用範圍。

type	用合適的分類標準或類型來描述該元素。
subtype	若有需要，提供該元素的次要分類

key	provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.
ref	(reference) provides an explicit means of locating a full definition for the entity being named by means of one or more URIs.

model.nameLike.agent	匯集包含個人或團體名稱的元素。
model.offsetLike	groups elements which can appear only as part of a place name.
model.persNamePart	匯集構成部分個人名稱的元素。
model.placeStateLike	groups elements which describe changing states of a place.

idno	(識別代碼) 提供任何用來識別書目項目的標準或非標準編碼。
lang	(語言名稱) 在詞源學或其他語言學相關訊息中所提到的語言。
rs	(參照字串) 包含一般名稱或參照字串。

addName	(附加名稱) 附加的名稱，例如綽號、稱號、或別名，或是在人名中出現的其他描述性措辭。
forename	包含名字或教名。
genName	該名稱用來和其他相似名稱做區別，以個人的相對年紀或隸屬世代作依據。
nameLink	(name link) 包含一個名稱中的連結字彙，但不屬於該名稱的一部份，例如 van der 或是 of。
roleName	包含一個身份名稱，代表個人在社會上所扮演的特殊角色或所處地位，例如官方頭銜或地位。
surname	包含一個家族姓氏，並非名字、教名、或綽號。

bloc	(bloc) 包含一個地理政治區域名稱，由一個或多個國家所組成。
country	(country) 包含一個地理政治區域名稱，例如民族、國家、殖民地、或聯邦區域，範圍大於一般地區或行政地位較高，但小於國家聯盟性的地理政治區。
district	包含任何次行政區名稱，例如教區、選區、或其他行政或地理單元。
geogName	(地理名稱) 與地形名稱結合的地名，例如威拉索溪谷、西奈山等。
placeName	包含一個確切位置或相對位置的名稱。
region	包含行政單位的名稱，例如州、省、或郡，範圍大於主政區，但小於國家。
settlement	包含主行政區的名稱，例如城市、鄉鎮、村莊等單一地理政治或行政單位。

type	指出數值的種類。
value	用標準形式來說明該數字所代表的值。

atLeast	gives a minimum estimated value for the approximate measurement.
atMost	gives a maximum estimated value for the approximate measurement.

quantity	標明包含該度量的特定單位數目
unit	指出度量單位，通常以標準符號表示
commodity	指出所度量的物品。

target	用一個或多個統一資源識別符參照 (URI References) 來說明參照所指位置。
evaluate	若指標的目標本身為指標，則在此說明其用意。

width	Where the media are displayed, indicates the display width
height	Where the media are displayed, indicates the display height
scale	Where the media are displayed, indicates a scale factor to be applied when generating the desired display size

bibl	(書目資料) 包含結構零散的書目資料，其中次要元件不一定會明確標記。
biblFull	包含結構完整的書目資料，其中呈現出TEI檔案描述內的所有元件。
biblStruct	(結構次要書目) 包含僅出現子節點的結構書目資料，並以特定順序呈現。
listRelation	描述一項語言互動中參與者之間的關係或社交關聯。
msDesc	包含單一份可識別手稿的描述。
relationGrp	(參與者關係) 描述一項語言互動中參與者之間的關係或社交關聯。

biblScope	(書目引用範圍) 標明書目參照資訊的範圍，例如標示頁碼列表、或是某著作的分支作品名稱。
distributor	提供負責發行文件的個人或其他經銷商的名稱。
publisher	提供負責出版或發行書目項目的機構名稱。
pubPlace	包含書目項目的出版地名稱。

date	包含任何格式的日期表示。
time	包含一組字詞，以任何形式定義時間

mainLang	提供一代碼，識別手稿中使用的主要語言。
otherLangs	一個或多個代碼識別手稿中使用的任何其他語言。

P5: TEI指引

3 Elements Available in All TEI Documents