9 Dictionaries
Índice
This chapter defines a module for encoding lexical resources of all kinds, in particular human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents. The elements described here may also be useful in the encoding of computational lexica and similar resources intended for use by language-processing software; they may also be used to provide a rich encoding for wordlists, lexica, glossaries, etc. included within other documents. Dictionaries are most familiar in their printed form; however, increasing numbers of dictionaries exist also in electronic forms which are independent of any particular printed form, but from which various displays can be produced.
Both typographically and structurally, print dictionaries are extremely complex. Such lexical resources are moreover of interest to many communities with different and sometimes conflicting goals. As a result, many general problems of text encoding are particularly pronounced here, and more compromises and alternatives within the encoding scheme may be required in the future.28 Two problems are particularly prominent.
First, because the structure of dictionary entries varies widely both among and within dictionaries, the simplest way for an encoding scheme to accommodate the entire range of structures actually encountered is to allow virtually any element to appear virtually anywhere in a dictionary entry. It is clear, however, that strong and consistent structural principles do govern the vast majority of conventional dictionaries, as well as many or most entries even in more ‘exotic’ dictionaries; encoding guidelines should include these structural principles. We therefore define two distinct elements for dictionary entries, one (entry) which captures the regularities of many conventional dictionary entries, and a second (entryFree) which uses the same elements, but allows them to combine much more freely. It is however recommended that entry be used in preference to entryFree wherever possible. These elements and their contents are described in sections 9.2 The Structure of Dictionary Entries, 9.6 Unstructured Entries, and 9.4 Headword and Pronunciation References.
Second, since so much of the information in printed dictionaries is implicit or highly compressed, their encoding requires clear thought about whether it is to capture the precise typographic form of the source text or the underlying structure of the information it presents. Since both of these views of the dictionary may be of interest, it proves necessary to develop methods of recording both, and of recording the interrelationship between them as well. Users interested mainly in the printed format of the dictionary will require an encoding to be faithful to an original printed version. However, other users will be interested primarily in capturing the lexical information in a dictionary in a form suitable for further processing, which may demand the expansion or rearrangement of the information contained in the printed form. Further, some users wish to encode both of these views of the data, and retain the links between related elements of the two encodings. Problems of recording these two different views of dictionary data are discussed in section 9.5 Typographic and Lexical Information in Dictionary Data, together with mechanisms for retaining both views when this is desired.
To deal with this complexity, and in particular to account for the wide variety of linguistic contexts within which a dictionary may be designed, it can be necessary to customize or change the schema by providing more restriction or possibly alternate content models for the elements defined in this chapter. Section 9.3.2 Grammatical Information illustrates this with the provision of a closed set of values for grammatical descriptors.
This chapter contains a large number of examples taken from existing print dictionaries; in each case, the original source is identified. In presenting such examples, we have tried to retain the original typographic appearance of the example as well as presenting a suggested encoding for it. Where this has not been possible (for example in the display of pronounciation) we have adopted the transliteration found in the electronic edition of the Oxford Advanced Learner's Dictionary. Also, the middle dot in quoted entries is rendered with a full stop, while within the sample transcriptions hyphenation and syllabification points are indicated by a vertical bar |, regardless of their appearance in the source text.
9.1 Dictionary Body and Overall StructureTEI: Dictionary Body and Overall Structure¶
Overall, dictionaries have the same structure of front matter, body, and back matter familiar from other texts. In addition, this module defines entry, entryFree, and superEntry as component-level elements which can occur directly within a text division or the text body.
- text contiene un único texto de cualquier tipo, sea este unitario o combinado, p.ej. un texto en verso o teatral, una recopilación de ensayos, una novela, un diccionario, o una fragmento de corpus.
- front (paratexto inicial) contiene cualquier material paratextual (encabezamiento, frontispicio, prefacio, dedicatoria, etc.) que aparece delante del inicio del texto.
- body (cuerpo del texto) contiene el cuerpo completo de un texto unitario, excluyendo los eventuales añadidos paratextuales (prólogos, dedicatorias, apéndices, etc.) al inicio o fin de un texto.
- back (paratexto final) contiene cualquier tipo de apéndice, etc. que aparece detrás del texto.
- div (división de texto) contiene una subdivisión del paratexto inicial, del cuerpo del texto o del paratexto final.
- entry contiene una entrada razonablemente bien estructurada.
- entryFree (entrada no estructurada) contiene una entrada de diccionario no necesariamente conforme con las restricciones impuestas por el elemento entry (entrada).
- superEntry agrupa las entradas sucesivas para una serie de homógrafos.
- att.entryLike agrupa las diferentes modalidades de entradas de
diccionarios
type indica el tipo de entrada en diccionarios que contienen de diferentes tipos. sortKey contiene una secuencia de caracteres que muestra la posición alfabética de la entrada en el diccionario impreso.
The front and back matter of a dictionary may well contain specialized material such as lists of common and proper nouns, grammatical tables, gazetteers, a ‘guide to the use of the dictionary’, etc. These should be tagged using elements defined elsewhere in these Guidelines, chiefly in the core module (chapter 3 Elements Available in All TEI Documents) together with the specialized dictionary elements defined in this chapter.
<div>
<head>English-French</head>
<entry>
<form>
<orth>cat</orth>
</form>
<!-- ... -->
</entry>
<entry>
<form>
<orth>dog</orth>
</form>
<!-- ... -->
</entry>
<entry>
<form>
<orth>horse</orth>
</form>
<!-- ... -->
</entry>
</div>
<div>
<head>French-English</head>
<entry>
<form>
<orth>chat</orth>
</form>
<!-- ... -->
</entry>
<entry>
<form>
<orth>chien</orth>
</form>
<!-- ... -->
</entry>
<entry>
<form>
<orth>cheval</orth>
</form>
<!-- ... -->
</entry>
</div>
</body>
In a print dictionary, the entries are typically typographically distinct entities, each headed by some morphological form of the lexical item described (the headword), and sorted in alphabetical order or (especially for non-alphabetic scripts) in some other conventional sequence. Dictionary entries should be encoded as distinct successive items, each marked as an entry or entryFree element. The type attribute may be used to distinguish different types of entries, for example main entries, related entries, run-on entries, or entries for cross-references, etc.
Some dictionaries provide distinct entries for homographs, on the basis of etymology, part-of-speech, or both, and typically provide a numeric superscript on the headword identifying the homograph number. In these cases each homograph should be encoded as a separate entry; the superEntry element may optionally be used to group such successive homograph entries. In addition to a series of entry elements, the superEntry may contain a preliminary form group (see section 9.3.1 Information on Written and Spoken Forms) when information about hyphenation, pronunciation, etc., is given only once for two or more homograph entries. If the homograph number is to be recorded, the global attribute n may be used for this purpose. In some dictionaries, homographs are treated in distinct parts of the same entry; in these cases, they may be separated by use of the hom element, for which see section 9.2.1 Hierarchical Levels.
A sort key, given in the sortKey attribute, is often required for superentries and entries, especially in cases where the order of entries does not follow the local character-set collating sequence (as, for example, when an entry for ‘3D’ appears at the place where ‘three-D’ would appear).
<entry>
<form>
<orth>manifestation</orth>
<!-- demonstration -->
</form>
</entry>
<entry>
<form>
<orth>émeute</orth>
<!-- riot -->
</form>
</entry>
<superEntry>
<entry type="hom" n="1">
<form>
<orth>grève</orth>
<!-- strike -->
</form>
</entry>
<entry type="hom" n="2">
<form>
<orth>grève</orth>
<!-- shore -->
</form>
</entry>
</superEntry>
</body>
9.2 The Structure of Dictionary EntriesTEI: The Structure of Dictionary Entries¶
A simple dictionary entry may contain information about the form of the word treated, its grammatical characterization, its definition, synonyms, or translation equivalents, its etymology, cross-references to other entries, usage information, and examples. These we refer to as the constituent parts or constituents of the entry; some dictionary constituents possess no internal structure, while others are most naturally viewed as groups of smaller elements, which may be marked in their own right. In some styles of markup, tags will be applied only to the low-level items, leaving the constituent groups which contain them untagged. We distinguish the class of top-level constituents of dictionary entries, which can occur directly within entries, from the class of phrase-level constituents, which can normally occur only within top-level constituents. The top-level constituents of dictionary entries are described in section 9.2.2 Groups and Constituents, and documented more fully, together with their phrase-level sub-constituents, in section 9.3 Top-level Constituents of Entries.
In addition, however, dictionary entries often have a complex hierarchical structure. For example, an entry may consist of two or more sub-parts, each corresponding to information for a different part-of-speech homograph of the headword. The entry (or part-of-speech homographs, if the entry is split this way) may also consist of senses, each of which may in turn be composed of two or more sub-senses, etc. Each sub-part, homograph entry, sense, or sub-sense we call a level; at any level in an entry, any or all of the constituent parts of dictionary entries may appear. The hierarchical levels of dictionary entries are documented in section 9.2.1 Hierarchical Levels.
9.2.1 Hierarchical LevelsTEI: Hierarchical Levels¶
- entry contiene una entrada razonablemente bien estructurada.
- entryFree (entrada no estructurada) contiene una entrada de diccionario no necesariamente conforme con las restricciones impuestas por el elemento entry (entrada).
- hom (homógrafo) agrupa información relativa a un homógrafo dentro de una entrada.
- sense agrupa toda la información relativa al significado de una
palabra en una entrada de diccionario (definiciones, ejemplos, sinónimos, etc.)
level la profundidad de anidamiente - dictScrap (fragmento de diccionario) engloba una parte de la entrada del diccionario en la que otros elementos de nivel sintagmático del diccionario se combinan de forma libre.
<sense n="1"/>
<sense n="2"/>
</entry>
<hom n="1">
<sense n="1">
<!-- ... -->
</sense>
<sense n="2">
<!-- ... -->
</sense>
</hom>
<hom n="2">
<sense n="1">
<sense n="a">
<!-- ... -->
</sense>
<sense n="b">
<!-- ... -->
</sense>
</sense>
<sense n="2">
<!-- ... -->
</sense>
<sense n="3">
<!-- ... -->
</sense>
</hom>
</entry>
<entry n="1" type="hom">
<sense n="1">
<!-- ... -->
</sense>
<sense n="2">
<!-- ... -->
</sense>
</entry>
<entry n="2" type="hom">
<sense n="1">
<sense n="a">
<!-- ... -->
</sense>
<sense n="b">
<!-- ... -->
</sense>
</sense>
<sense n="2">
<!-- ... -->
</sense>
<sense n="3">
<!-- ... -->
</sense>
</entry>
</superEntry>
9.2.2 Groups and ConstituentsTEI: Groups and Constituents¶
- information about the form of the word treated (orthography, pronunciation, hyphenation, etc.)
- grammatical information (part of speech, grammatical sub-categorization, etc.)
- definitions or translations into another language
- etymology
- examples
- usage information
- cross-references to other entries
- notes
- entries (often of reduced form) for related words, typically called related entries
- form (grupo de información de forma) agrupa toda la información relativa a la forma oral y escrita de una palabra.
- gramGrp (grupo de información gramatical) agrupa la información morfosintáctica sobre un elemento léxico, p.ej. pos, gen, number, case, o iType (categoría no flexiva).
- def (definición) contiene el texto de definición en una entrada de diccionario.
- cit (cita) Una cita de algún otro documento junto a la referencia bibliográfica a la fuente.
- usg (uso) contiene la información sobre el uso en la entrada de un diccionario.
- xr (sintagma de referencia cruzada) contiene un sintagma, oración o icono referido al lector hacia alguna otro punto de este u otro texto.
- etym (etimología) engloba la información etimológica en una entrada de diccionario.
- re (entrada relativa) contiene una entrada de diccionario para un elemento léxico relativo al lema, como p.ej. un sintagma compuesto o una forma derivada, y que se incluye en un entrada mayor.
- note contiene una nota o aclaración
com.peti.tor
/k@m"petit@(r)/
n person who competes. OALD
<form>
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>person who competes.</def>
</entry>
disproof
(dIs"pru:f)
n. 1. facts that disprove something. 2. the act of disproving. CED
<form>
<orth>disproof</orth>
<pron>dIs"pru:f</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>facts that disprove something.</def>
</sense>
<sense n="2">
<def>the act of disproving.</def>
</sense>
</entry>
bray
/breI/
n cry of an ass; sound of a trumpet. ∙ vt [VP2A] make a cry or sound of this kind. OALD
<form>
<orth>bray</orth>
<pron>breI</pron>
</form>
<hom>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>cry of an ass; sound of a trumpet.</def>
</hom>
<hom>
<gramGrp>
<pos>vt</pos>
<subc>VP2A</subc>
</gramGrp>
<def>make a cry or sound of this kind.</def>
</hom>
</entry>
ca.reen
/k@"ri:n/
vt,vi 1 [VP6A] turn (a ship) on one side for cleaning, repairing, etc. 2 [VP6A, 2A] (cause to) tilt, lean over to one side. OALD
<form>
<orth>careen</orth>
<hyph>ca|reen</hyph>
<pron>k@"ri:n</pron>
</form>
<gramGrp>
<pos>vt</pos>
<pos>vi</pos>
</gramGrp>
<sense n="1">
<gramGrp>
<subc>VP6A</subc>
</gramGrp>
<def>turn (a ship) on one side for cleaning, repairing, etc.</def>
</sense>
<sense n="2">
<gramGrp>
<subc>VP6A</subc>
<subc>VP2A</subc>
</gramGrp>
<def>(cause to) tilt, lean over to one side.</def>
</sense>
</entry>
a.ban.don 1
/@"band@n/
v [T1] 1 to leave completely and for ever; desert: The sailors abandoned the burning ship. 2 …abandon 2 n [U] the state when one's feelings and actions are uncontrolled; freedom from control...LDOCE
<form>
<orth>abandon</orth>
<hyph>a|ban|don</hyph>
<pron>@"band@n</pron>
</form>
<entry n="1">
<gramGrp>
<pos>v</pos>
<subc>T1</subc>
</gramGrp>
<sense n="1">
<def>to leave completely and for ever … </def>
</sense>
<sense n="2"/>
</entry>
<entry n="2">
<gramGrp>
<pos>n</pos>
<subc>U</subc>
</gramGrp>
<def>the state when one's feelings and actions are uncontrolled; freedom
from control…</def>
</entry>
</superEntry>
9.3 Top-level Constituents of EntriesTEI: Top-level Constituents of Entries¶
- the form element, which groups orthographic information and pronunciations, is described in section 9.3.1 Information on Written and Spoken Forms
- the gramGrp element, which groups elements for the grammatical characterization of the headword, is described in section 9.3.2 Grammatical Information
- the def element, which describes the meaning of the headword, is described in section 9.3.3 Sense Information
- the etym element and its special phrase-level elements are documented in section 9.3.4 Etymological Information
- the cit element and its specific applications are described in section 9.3.3 Sense Information and section 9.3.5 Other Information
- the usg, lbl, xr, and note elements are described in section 9.3.5 Other Information
- the re element, which marks nested entries for related words, is described in section 9.3.6 Related Entries
9.3.1 Information on Written and Spoken FormsTEI: Information on Written and Spoken Forms¶
Dictionary entries most often begin with information about the form of the word to which the entry applies. Typically, the orthographic form of the word, sometimes marked for syllabification or hyphenation, is the first item in an entry. Other information about the word, including variant or alternate forms, inflected forms, pronunciation, etc., is also often given.
- form (grupo de información de forma) agrupa toda la información relativa a la forma oral y
escrita de una palabra.
type clasifica la forma como simple, compuesta, etc. - orth (forma ortográfica) proporciona la forma ortográfica del lema de la entrada
del diccionario
type indica el tipo de ortografía. extent indica la extensión de la información ortográfica proporcionada. - pron (pronunciación) indica la pronunciación de la palabra.
extent indica si se trata de la pronunciación de la palabra completa o de una parte. - hyph (uso del guión) contiene alguna forma de uso del guión en el lema de una entrada de diccionario, o información relativa al usa del guión.
- syll (silabación) contiene la silabación de un lema
- stress contiene la entonación de un lema del diccionario, si se da separadamente.
- lbl (etiqueta) en diccionarios, contiene una etiqueta para una forma, ejemplo, traducción u otra información, p.ej. abreviatura para, contracción de, literal., aproximadamente, sinónimo, etc.
- gram (información gramatical) dentro de una entrada de un diccionario, o en un archivo
de datos terminológicos, contiene información gramatical relativa al término, palabra o forma.
type clasifica la información gramatical dada de acuerdo a una tipología funcional — en el caso de información terminológica, preferiblemente el diccionario de tipos de elemento de datos se especifica en ISO WD 12 620. - gen (género) identifica el género morfológico de un elemento léxico, como viene dado en el diccionario.
- number indica el número gramatical asociado a una palabra, como viene dado en el diccionario.
- case contiene la información del caso gramatical.
- per (persona) indica la persona gramatical (1ª, 2ª, 3ª, etc.) asociada con una forma flexiva en un diccionario.
- tns (tiempo) indica el tiempo gramatical asociado con con una forma flexiva dada en un diccionario
- mood contiene información sobre el modo gramatical de los verbos (p.ej. indicativo, subjuntivo, imperativo).
- iType (categoría no flexiva) indica la categoría no flexiva de un elemento léxico.
type señala el tipo de indicador usado para especificar la categoría flexiva, cuando es necesario distinguir entre los indicadores abreviados habituales (p.ej. inv) y otros tipos de indicadores, como códigos especiales para referir modelos de conjugación, etc.
Different dictionaries use different means to mark hyphenation, syllabification, and stress, and they often use some unusual glyphs (e.g., the ‘middle dot’ for hyphenation). All of these glyphs are in the Unicode character set, as discussed in Character References. When transcribing representations of pronunciation the International Phonetic Alphabet should be used. It may be convenient (as has been done in the text of this chapter) to use a simple transliteration scheme for this; such a scheme should however be properly documented in the header.
<orth>doom-laden</orth>
</form>
soucoupe [sukup] … DNT
<orth>soucoupe</orth>
<pron>sukup</pron>
</form>
For a variety of reasons including ease of processing, it may be desired to split into separate elements information which is collapsed into a single element in the source text; orthography and hyphenation may for example be transcribed as separate elements, although given together in the source text. For a discussion of the issues involved, and of methods for retaining both the presentation form and the interpreted form, see section 9.5 Typographic and Lexical Information in Dictionary Data.
ar.ea … W7
<orth>area</orth>
<hyph>ar|ea</hyph>
<syll>ar|e|a</syll>
</form>
brag … vb. brags, bragging, bragged … CED
<orth>brag</orth>
</form>
<gramGrp>
<pos>vb</pos>
</gramGrp>
<form type="inflected">
<orth>brags</orth>
<orth>bragging</orth>
<orth>bragged</orth>
</form>
horrifier
[ORifje]
(7) vt … [C/R]
<orth>horrifier</orth>
<pron>ORifje</pron>
<iType type="vbtable">7</iType>
</form>
MTBF abbrev. for mean time between failures. CED
<form type="abbrev">
<orth>MTBF</orth>
</form>
<form type="full">
<lbl>abbrev. for</lbl>
<orth>mean time between failures</orth>
</form>
</entry>
biryani or biriani
(%bIrI"A:nI)
… CED
<orth>biryani</orth>
<orth>biriani</orth>
<pron>%bIrI"A:nI</pron>
</form>
mackle
("mak^@l)
or macule("makju:l)
… CED
<orth>mackle</orth>
<pron>"makəl</pron>
</form>
<form>
<orth>macule</orth>
<pron>"makju:l</pron>
</form>
hospitaller or U.S. hospitaler
("hQspIt@l@)
… CED
<orth>hospitaller</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>hospitaler</orth>
</form>
<pron>"hQspIt@l@</pron>
</form>
9.3.2 Grammatical InformationTEI: Grammatical Information¶
- pos (parte de un discurso) indica la parte del discurso asignado a un lema de diccionario (nombre, verbo, adjetivo, etc.)
- subc (subcategorización) contiene información sobre la subcategorización (transitivo/intransitivo, contable/no contable, etc.)
- colloc (colocación) indica la colocación de un lema
In addition, gramGrp can contain any of the morphological elements defined in section 9.3.1 Information on Written and Spoken Forms for form. Elements conveying morphological information bear different interpretations within gramGrp and form groups, the difference being that in the form group, the morphological information specified pertains to the specific alternate form in question, while within gramGrp it applies to the headword form. For example, in the entry ‘ pinna ('pIn@) n., pl. -nae (-ni:) or -nas’ CED, the word defined can be either singular or plural; the ‘pl.’ specification applies only to the inflected forms provided. Compare this with ‘pants (paents) pl. n.’, where ‘pl.’ applies to the headword itself.
This entry can be tagged using specialized grammatical elements:médire v.t. ind. (de) … PLC
<orth>médire</orth>
</form>
<gramGrp>
<pos>v</pos>
<subc>t ind</subc>
<colloc type="prep">de</colloc>
</gramGrp>
<orth>médire</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
<gram type="subc">t ind</gram>
<gram type="collocPrep">de</gram>
</gramGrp>
isotope adj. et n. m. … DNT
<orth>isotope</orth>
</form>
<gramGrp>
<pos>adj</pos>
</gramGrp>
<gramGrp>
<pos>n</pos>
<gen>m</gen>
</gramGrp>
wits (wIts) pl. n. 1. (sometimes sing.) the ability to reason and act, esp. quickly … CED
<form>
<orth>wits</orth>
<pron>wIts</pron>
</form>
<gramGrp>
<number>pl</number>
<pos>n</pos>
</gramGrp>
<sense n="1">
<gramGrp>
<number>sometimes sing.</number>
</gramGrp>
<def>the ability to reason and act, esp. quickly …</def>
</sense>
</entry>
9.3.3 Sense InformationTEI: Sense Information¶
Dictionaries may describe the meanings of words in a wide variety of different ways — by means of synonyms, paraphrases, translations into other languages, formal definitions in various highly stylized forms, etc. No attempt is made here to distinguish all the different forms which sense information may take; all of them may be tagged using the def element described in section 9.3.3.1 Definitions.
As a special case it is frequently desirable to distinguish the provision of translation equivalents in other languages from other forms of sense information; the use of <cit type="translation"> (which groups a translation equivalent with related information such as its grammatical description) for this purpose is described in section 9.3.3.2 Translation Equivalents.
9.3.3.1 DefinitionsTEI: Definitions¶
Dictionary definitions are those pieces of prose in a dictionary entry that describe the meaning of some lexical item. Most often, definitions describe the headword of the entry; in some cases, they describe translated texts, examples, etc.; see <cit type="translation">, section 9.3.3.2 Translation Equivalents, and <cit type="example">, section 9.3.5.1 Examples. The def element directly contains the text of the definition; unlike form and gramGrp, it does not serve solely to group a set of smaller elements. The close analysis of definition text, such as the tagging of hypernyms, typical objects, etc., is not covered by these Guidelines.
demigod (…) n. 1.a. a being who is part mortal, part god. b. a lesser deity. 2. a godlike person. CP
<form>
<orth>demigod</orth>
<pron> … </pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<sense n="a">
<def>a being who is part mortal, part god.</def>
</sense>
<sense n="b">
<def>a lesser deity.</def>
</sense>
</sense>
<sense n="2">
<def>a godlike person.</def>
</sense>
</entry>
rémoulade
[Remulad]
nf remoulade, rémoulade (dressing containing mustard and herbs). CR
<form>
<orth>rémoulade</orth>
<pron>Remulad</pron>
</form>
<gramGrp>
<pos>n</pos>
<gen>f</gen>
</gramGrp>
<cit type="translation" xml:lang="en">
<quote>remoulade</quote>
<quote>rémoulade</quote>
<def>dressing containing mustard and herbs</def>
</cit>
</entry>
9.3.3.2 Translation EquivalentsTEI: Translation Equivalents¶
Multilingual dictionaries contain information about translations of a given word in some source language for one or more target languages. Minimally, the dictionary provides the corresponding translation in the target language; other material, such as morphological information (gender, case), various kinds of usage restrictions, etc., may also be given. If translation equivalents are to be distinguished from other kinds of sense information, they may be encoded using <cit type="translation">. The global xml:lang attribute should be used to specify the target language.
dresser … (a) (Theat) habilleur m, -euse f; (Comm: window ~) étalagiste mf. she's a stylish ~ elle s'habille avec chic; V hair. (b) (tool) (for wood) raboteuse f; (for stone) rabotin m. CR
<form>
<orth>dresser</orth>
</form>
<sense n="a">
<sense>
<usg type="dom">Theat</usg>
<cit type="translation" xml:lang="fr">
<quote>habilleur</quote>
<gen>m</gen>
</cit>
<cit type="translation" xml:lang="fr">
<quote>-euse</quote>
<gen>f</gen>
</cit>
</sense>
<sense>
<usg type="dom">Comm</usg>
<form type="compound">
<orth>window <oRef/>
</orth>
</form>
<cit type="translation" xml:lang="fr">
<quote>étalagiste</quote>
<gen>mf</gen>
</cit>
</sense>
<cit type="example">
<quote>she's a stylish <oRef/>
</quote>
<cit type="translation" xml:lang="fr">
<quote>elle s'habille avec chic</quote>
</cit>
</cit>
<xr type="see">V. <ref target="#hair">hair</ref>
</xr>
</sense>
<sense n="b">
<usg type="category">tool</usg>
<sense>
<usg type="hint">for wood</usg>
<cit type="translation" xml:lang="fr">
<quote>raboteuse</quote>
<gen>f</gen>
</cit>
</sense>
<sense>
<usg type="hint">for stone</usg>
<cit type="translation" xml:lang="fr">
<quote>rabotin</quote>
<gen>m</gen>
</cit>
</sense>
</sense>
</entry>
<!-- ... -->
<entry xml:id="hair">
<sense>
<!-- ... -->
</sense>
</entry>
O.A.S. ... nf (abrév de Organisation de l'Armée secrète) OAS (illegal military organization supporting French rule of Algeria). CR
<cit type="translation" xml:lang="en">
<quote>OAS</quote>
<def>illegal military organization supporting French rule of
Algeria</def>
</cit>
</entry>
havdalah or havdoloh Hebrew. (Hebrew
hAvdA"lA;
YiddishhAv"dOl@)
n. Judaism. the ceremony marking the end of the sabbath or of a festival, including the blessings over wine, candles and spices. [literally: separation] CED
<form>
<orth>havdalah</orth>
<orth>havdoloh</orth>
</form>
<usg type="dom">Judaism</usg>
<def>the ceremony marking the end of the sabbath or of a festival,
including the blessings over wine, candles and spices.</def>
<cit type="translation" xml:lang="en">
<note>literally</note>
<quote>separation</quote>
</cit>
</entry>
9.3.4 Etymological InformationTEI: Etymological Information¶
- etym (etimología) engloba la información etimológica en una entrada de diccionario.
- lang (nombre de lengua) el nombre de una lengua mencionada en una información etimológica o lingüística de cualquier tipo.
- date contiene una fecha en cualquier formato.
- mentioned marca palabras o locuciones mencionadas, no usadas.
- gloss identifica una locución o palabra usada para proporcionar una glosa o definición sobre otra palabra o frase.
- pron (pronunciación) indica la pronunciación de la palabra.
- usg (uso) contiene la información sobre el uso en la entrada de un diccionario.
- lbl (etiqueta) en diccionarios, contiene una etiqueta para una forma, ejemplo, traducción u otra información, p.ej. abreviatura para, contracción de, literal., aproximadamente, sinónimo, etc.
As in other prose, individual word forms mentioned in an etymological description are tagged with mentioned elements. Pronunciations, usage labels, and glosses can be tagged using the pron, usg, and gloss elements defined elsewhere in these Guidelines. In addition, the lang element may be used to identify a particular language name where it appears, in addition to using the xml:lang attribute of the mentioned element.
abismo m. (del gr. a priv. y byssos, fondo). Sima, gran profundidad. …
<form>
<orth>abismo</orth>
</form>
<etym>del <lang>gr.</lang>
<mentioned>a</mentioned> priv. y <mentioned>byssos</mentioned>,
<gloss>fondo</gloss>
</etym>
</entry>
neume
\'n(y)üm\
n [F, fr. ML pneuma, neuma, fr. Gk pneuma breath — more at pneumatic]: any of various symbols used in the notation of Gregorian chant … [WNC]
<etym>
<lang>F</lang> fr. <lang>ML</lang>
<mentioned>pneuma</mentioned>
<mentioned>neuma</mentioned> fr. <lang>Gk</lang>
<mentioned>pneuma</mentioned>
<gloss>breath</gloss>
<xr type="etym">more at <ptr target="#pneumatic"/>
</xr>
</etym>
<def>any of various symbols … </def>
</entry>
<!-- ... -->
<entry xml:id="pneumatic">
<etym>
<!-- ... -->
</etym>
</entry>
9.3.5 Other InformationTEI: Other Information¶
9.3.5.1 ExamplesTEI: Examples¶
Dictionaries typically include examples of word use, usually accompanying definitions or translations. In some cases, the examples are quotations from another source, and are occasionally followed by a citation to the author.
- q (discurso citado, pensado o escrito) contiene material que se marca como (ostensiblemente) citado de cualquier otro sitio: en la narrativa, este elemento se usa para marcar discursos en estilo directo o indirecto; en diccionarios puede usarse para marcar ejemplos reales o ideados de uso; en descripciones de manuscritos u otros metadatos, para marcar extractos citados de la fuente que se documenta.
- quote (cita) contiene una frase o pasaje atribuido por el narrador o autor a un agente externo al texto.
- cit (cita) Una cita de algún otro documento junto a la referencia bibliográfica a la fuente.
Examples frequently abbreviate the headword, and so their transcription will frequently make use of the oRef or oVar elements described below in section 9.4 Headword and Pronunciation References.
multiplex
/…/
adj tech having many parts: the multiplex eye of the fly. LDOCE
<quote>the multiplex eye of the fly.</quote>
</cit>
some … 4. (S~ and any are used with more): Give me ~ more
/s@'mO:(r)/
OALD
<usg type="colloc">
<oRef type="cap"/> and <mentioned>any</mentioned> are used with
<mentioned>more</mentioned>
</usg>
<cit type="example">
<quote>Give me <oRef/> more</quote>
<pron extent="part">s@'mO:(r)</pron>
</cit>
</sense>
horrifier … vt to horrify. elle était horrifiée par la dépense she was horrified at the expense. CR
<cit type="translation" xml:lang="en">
<quote>to horrify</quote>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense.</quote>
</cit>
</cit>
</entry>
valeur … n. f. … 2. Vx. Vaillance, bravoure (spécial., au combat). ‘La valeur n'attend pas le nombre des années’ (Corneille). … DNT
<usg type="time">Vx.</usg>
<def>Vaillance, bravoure (spécial., au combat)</def>
<cit type="example">
<quote>La valeur n'attend pas le nombre des années</quote>
<bibl>
<author>Corneille</author>
</bibl>
</cit>
</sense>
9.3.5.2 Usage Information and Other LabelsTEI: Usage Information and Other Labels¶
- usg (uso) contiene la información sobre el uso en la entrada de un diccionario.
- lbl (etiqueta) en diccionarios, contiene una etiqueta para una forma, ejemplo, traducción u otra información, p.ej. abreviatura para, contracción de, literal., aproximadamente, sinónimo, etc.
- temporal use (archaic, obsolete, etc.)
- register (slang, formal, taboo, ironic, facetious, etc.)
- style (literal, figurative, etc.)
- connotative effect (e.g. derogatory, offensive)
- subject field (Astronomy, Philosophy, etc.)
- national or regional use (Australian, U.S., Midland dialect, etc.)
- geo
- geographic area
- time
- temporal, historical era (‘archaic’, ‘old’, etc.)
- dom
- domain
- reg
- register
- style
- style (figurative, literal, etc.)
- plev
- preference level (‘chiefly’, ‘usually’, etc.)
- acc
- acceptability
- lang
- language for foreign words, spellings pronunciations, etc.
- gram
- grammatical usage
- syn
- synonym given to show use
- hyper
- hypernym given to show usage
- colloc
- collocation given to show usage
- comp
- typical complement
- obj
- typical object
- subj
- typical subject
- verb
- typical verb
- hint
- unclassifiable piece of information to guide sense choice
colour or U.S. color … CED
<orth>colour</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>color</orth>
</form>
</form>
palette
[palEt]
nf (a) (Peinture: lit, fig) palette. (b) (Boucherie) shoulder. (c) (aube de roue) paddle; (battoir à linge) beetle; (Manutention, Constr) pallet. CR
<usg type="dom">Peinture</usg>
<usg type="style">lit</usg>
<usg type="style">fig</usg>
<cit type="translation" xml:lang="en">
<quote>palette</quote>
</cit>
</sense>
<sense n="b">
<usg type="dom">Boucherie</usg>
<cit type="translation" xml:lang="en">
<quote>shoulder</quote>
</cit>
</sense>
<sense n="c">
<sense>
<usg type="syn">aube de roue</usg>
<cit type="translation" xml:lang="en">
<quote>paddle</quote>
</cit>
</sense>
<sense>
<usg type="syn">battoir à linge</usg>
<cit type="translation" xml:lang="en">
<quote>beetle</quote>
</cit>
</sense>
<sense>
<usg type="dom">Manutention</usg>
<usg type="dom">Constr</usg>
<cit type="translation" xml:lang="en">
<quote>pallet</quote>
</cit>
</sense>
</sense>
rempaillage […] nm reseating, rebottoming (with straw). CR
<cit type="translation" xml:lang="en">
<quote>reseating</quote>
<quote>rebottoming</quote>
<usg type="hint">with straw</usg>
</cit>
</entry>
9.3.5.3 Cross-References to Other EntriesTEI: Cross-References to Other Entries¶
Dictionary entries frequently refer to information in other entries, often using extremely dense notations to convey the headword of the entry to be sought, the particular part of the entry being referred to, and the nature of the information to be sought there (synonyms, antonyms, usage notes, etymology, an illustration, etc.)
- xr (sintagma de referencia cruzada) contiene un sintagma, oración o icono referido al lector hacia alguna otro punto de este u otro texto.
- ref (referencia) define una referencia a otra localización, posiblemente modificada por un texto o comentario adicional.
- ptr/ (puntero) define un señalizador a otra localización.
- lbl (etiqueta) en diccionarios, contiene una etiqueta para una forma, ejemplo, traducción u otra información, p.ej. abreviatura para, contracción de, literal., aproximadamente, sinónimo, etc.
glee … Compare madrigal (sense 1) CED
<form>
<orth>glee</orth>
</form>
<xr>Compare <ptr target="#madrigal.1"/>
</xr>
</entry>
<entry xml:id="madrigal.1">
<form>
<!-- ... -->
</form>
</entry>
hostellerie Syn. de hôtellerie (sens 1). DNT
<lbl>Syn. de</lbl>
<ref>hôtellerie (sens 1)</ref>.
</xr>
rose2 … vb. the past tense of rise. CED
<form>
<orth>rose</orth>
</form>
<xr type="inflectedForm">
<lbl>the past tense of</lbl>
<ref target="#rise">rise</ref>
</xr>
</entry>
<!-- ... -->
<entry xml:id="rise">
<form>
<orth>rise</orth>
</form>
<!-- main entry for "rise" as verb -->
</entry>
antagonist … syn see adverse W7
<lbl>syn see</lbl>
<ref target="#adverse">adverse</ref>
</xr>
<!-- ... -->
<entry xml:id="adverse">
<form>
<orth>adverse</orth>
</form>
<!-- list of synonyms for "adverse" -->
</entry>
globe …V. armillaire (sphère) PR
<lbl type="sense-restriction">sphère</lbl>
</xr>
The asterisk signals a reference to the entry for incapable.entacher … Acte entaché de nullité, contenant un vice de forme ou passé par un incapable*. DNT
justifier …4. IMPRIM Donner a (une ligne) une longeur convenable au moyen de blancs (2, sens 1, 3). DNT
<usg type="dom">imprim</usg>
<def>Donner a (une ligne) une longeur convenable au moyen de
<ref target="#blanc-2.1.3">blancs (2, sens 1, 3)</ref>
</def>
</sense>
<entry xml:id="blanc" n="2">
<!-- ... -->
<sense n="1">
<!-- ... -->
<def xml:id="blanc-2.1.3">...</def>
<!-- ... -->
</sense>
<!-- ... -->
</entry>
9.3.5.4 Notes within EntriesTEI: Notes within Entries¶
- note contiene una nota o aclaración
ain't
(eInt)
Not standard. contraction of am not, is not, are not, have not or has not: I ain't seen it. ….Usage. Although the interrogative form ain't I? would be a natural contraction of am I not?, it is generally avoided in spoken English and never used in formal English. CED
<form type="contraction">
<orth>ain't</orth>
<pron>eInt</pron>
</form>
<usg type="reg">Not standard</usg>
<form type="full">
<lbl>contraction of</lbl>
<orth>am not</orth>
<orth>is not</orth>
<orth>are not</orth>
<orth>have not</orth>
<orth>has not</orth>
</form>
<cit type="example">
<quote>I ain't seen it.</quote>
</cit>
<note type="usage">Although the interrogative form <mentioned>ain't
I?</mentioned> would be a natural contraction of <mentioned>am I
not?</mentioned>, it is generally avoided in spoken English and
never used in formal English.</note>
</entry>
The formal declaration for note is given in section 3.8 Notes, Annotation, and Indexing.
9.3.6 Related EntriesTEI: Related Entries¶
The re element encloses a degenerate entry which appears in the body of another entry for some purpose. Many dictionaries include related entries for direct derivatives or inflected forms of the entry word, or for compound words, phrases, collocations, and idioms containing the entry word.
Related entries can be complex, and may in fact include any of the information to be found in a regular entry. Therefore, the re element is defined to contain the same elements as an entry element, with the exception that it may not contain any nested re elements.
bevvy
("bEvI)
Dialect. ~ n., pl. -vies. 1. a drink, esp. an alcoholic one: we had a few bevvies last night. 2. a night of drinking. ~ vb. - vies, -vying, -vied (intr.) 3. to drink alcohol [probably from Old French bevee, buvee, drinking] —'bevvied adj. CED
<form>
<orth>bevvy</orth>
<pron>"bEvI</pron>
</form>
<usg type="reg">Dialect</usg>
<hom>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>a drink, esp. an alcoholic one: we had a few bevvies last night.</def>
</sense>
</hom>
<!-- ... sense 2 ... -->
<hom>
<gramGrp>
<pos>vb</pos>
</gramGrp>
<sense n="3">
<def>to drink alcohol</def>
</sense>
</hom>
<etym>probably from <lang>Old French</lang>
<mentioned>bevee</mentioned>, <mentioned>buvee</mentioned>
<gloss>drinking</gloss>
</etym>
<re type="derived">
<form>
<orth>bevvied</orth>
</form>
<gramGrp>
<pos>adj</pos>
</gramGrp>
</re>
</entry>
9.4 Headword and Pronunciation ReferencesTEI: Headword and Pronunciation References¶
- oRef/ (referencia ortografía) en un diccionario, indica la forma ortográfica de un
lema.
type indica el tipo de modificación tipográfica hecha en el lema en la referencia. - pRef/ (referencia a la pronunciación) en un diccionario, indica la pronunciación de un lema.
- oVar (referencia de variante ortográfica) en un diccionario, indica una referencia a una forma/s de
variante ortográfica del lema.
type indica el tipo de variante implicada - pVar (referencia a una variante en la pronunciación) en un diccionario, indica una referencia a una variante de pronunciación del lema.
- att.pointing define un conjunto de atributos usados por todos los elementos que señalan a otros elementos a través de uno o más URI.
target especifica la destinación de una referencia proporcionando una o más referencias URI.
- ~
- indicates a reference to the full form of the headword
- pref~
- gives a prefix to be affixed to the headword
- ~suf
- gives a suffix to be affixed to the headword
- A~
- gives the first letter in uppercase, indicating that the headword is capitalized
- pref~suf
- gives a prefix and a suffix to be affixed to the headword
- a.
- gives the initial of the word followed by a full stop, to indicate reference to the full form of the headword
- A.
- refers to a capitalized form of the headword
The oRef element should be used for iconic or shortened references to the orthographic form(s) of the headword itself. It is an empty element and replaces, rather than enclosing, the reference. Note that the reference to a headword is not necessarily a simple string replacement. In the example ‘ colour1, (US = color) …~ films; ~ TV; Red, blue and yellow are ~s.’ OALD, the tilde stands for either headword form (colour, color).
colonel … army officer above a lieutenant-~. OALD
</def>
academy … The Royal A~ of Arts OALD
vag- or vago- comb form … : vagus nerve < vagal > < vagotomy > W7
<form>
<orth xml:id="di-o1">vag-</orth>
<orth xml:id="di-o2">vago-</orth>
</form>
<def>vagus nerve</def>
<cit type="example">
<quote>
<oRef target="#di-o1" type="nohyph"/>al</quote>
<quote>
<oRef target="#di-o2" type="nohyph"/>tomy</quote>
</cit>
</entry>
take … < Mr Burton took us for French > NPEG
<quote>Mr Burton <oVar type="pt">took</oVar> us for French</quote>
</cit>
take … < was quite ~n with him > NPEG
<quote>was quite <oVar type="pp">
<oRef/>n</oVar> with him</quote>
</cit>
mix up… < it's easy to mix her up with her sister > NPEG
<quote>it's easy to <oVar next="#ov2" xml:id="ov1">mix</oVar>
her <oVar prev="#ov1" xml:id="ov2">up</oVar> with her sister</quote>
</cit>
hors d'oeuvre
/,aw'duhv
(Fr O:r dœvr)/ n, pl hors d'oeuvres also hors d'oeuvre/'duhv(z)
(Fr ~)/ NPEG
<orth>hors d'oeuvre</orth>
<pron>%aU"dUv</pron>
<form>
<usg type="lang">Fr</usg>
<pron xml:id="di-p2">OR d0vR</pron>
</form>
</form>
<form type="inflected">
<number>pl</number>
<orth>hors d'oeuvres</orth>
<orth>hors d'oeuvre</orth>
<pron extent="part">"dUv(z)</pron>
<form>
<usg type="lang">Fr</usg>
<pron>
<pRef target="#di-p2"/>
</pron>
</form>
</form>
Because headword and pronunciation references can occur virtually anywhere in an entry, the oRef, oVar, pRef, and pVar elements can appear within any other element defined for dictionary entries.
Since existing printed dictionaries use different conventions for headword references (swung dash, first letter abbreviated form, capitalization, or italicization of the word, etc.) the exact method used should be documented in the header.
9.5 Typographic and Lexical Information in Dictionary DataTEI: Typographic and Lexical Information in Dictionary Data¶
- (a) the typographic view — the two-dimensional printed page, including information about line and page breaks and other features of layout
- (b) the editorial view — the one-dimensional sequence of tokens which can be seen as the input to the typesetting process; the wording and punctuation of the text and the sequencing of items are visible in this view, but specifics of the typographic realization are not
- (c) the lexical view — this view includes the underlying information represented in a dictionary, without concern for its exact textual form
For example, a domain indication in a dictionary entry might be broken over a line and therefore hyphenated (‘naut-’ ‘ical’); the typographic view of the dictionary preserves this information. In a purely editorial view, the particular form in which the domain name is given in the particular dictionary (as ‘nautical’, rather than ‘naut.’, ‘Naut.’, etc.) would be preserved, but the fact of the line break would not. Font shifts might plausibly be included in either a strictly typographic or an editorial view. In the lexical view, the only information preserved concerning domain would be some standard symbol or string representing the nautical domain (e.g. ‘naut.’) regardless of the form in which it appears in the printed dictionary.
In practice, publishers begin with the lexical view — i.e., lexical data as it might appear in a database — and generate first the editorial view, which reflects editorial choices for a particular dictionary (such as the use of the abbreviation ‘Naut.’ for ‘nautical’, the fonts in which different types of information are to be rendered, etc.), and then the typographic view, which is tied to a specific printed rendering. Computational linguists and philologists often begin with the typographic view and analyse it to obtain the editorial and/or lexical views. Some users may ultimately be concerned with retaining only the lexical view, or they may wish to preserve the typographic or editorial views as a reference text, perhaps as a guard against the loss or misinterpretation of information in the translation process. Some researchers may wish to retain all three views, and study their interrelations, since research questions may well span all three views.
In general, an electronic encoding of a text will allow the recovery of at least one view of that text (the one which guided the encoding); if editorial and typographic practices are consistently applied in the production of a printed dictionary, or if exceptions to the rules are consistently recorded in the electronic encoding, then it is in principle possible to recover the editorial view from an encoding of the lexical view, and the typographic view from an encoding of the editorial view. In practice, of course, the severe compression of information in dictionaries, the variety of methods by which this compression is achieved, the complexity of formulating completely explicit rules for editorial and typographic practice, and the relative rarity of complete consistency in the application of such rules, all make the mechanical transformation of information from one view into another something of a vexed question.
This section describes some principles which may be useful in capturing one or the other of these views as consistently and completely as possible, and describes some methods of attempting to capture more than one view in a single encoding. Only the editorial and lexical views are explicitly treated here; for methods of recording the physical or typographic details of a text, see chapter 11 Representation of Primary Sources. Other approaches to these problems, such as the use of repetitive encoding and links to show their correspondences, or the use of feature structures to capture the information structure, and of the ana and inst attributes to link feature structures to a transcription of the editorial view of a dictionary, are not discussed here (for feature structures, see chapter 18 Feature Structures. For linkage of textual form and underlying information, see chapter 17 Simple Analytic Mechanisms).
9.5.1 Editorial ViewTEI: Editorial View¶
- All characters of the source text should be retained, with the possible exception of rendition text (for which see further below).
- Characters appearing in the source text should typically be given as character data content in the document, rather than as the value of an attribute; again, rendition text may optionally be excepted from this rule.
- Apart from the characters or graphics in the source text, nothing else should appear as content in the document, although it may be given in attribute values.
- The material in the source text should appear in the encoding in the same order. Complications of the character sequence by footnotes, marginal notes, etc., text wrapping around illustrations, etc., may be dealt with by the usual means (for notes, see section 3.8 Notes, Annotation, and Indexing).29
A conservative encoding of the editorial view of this entry, which retains all rendition text, might resemble the following:pinna ('pIn@) n., pl. -nae (-ni:) or -nas. 1. any leaflet of a pinnate compound leaf. 2. Zoology. a feather, wing, fin, or similarly shaped part. 3. another name for auricle (sense 2). [C18: via New Latin from Latin: wing, feather, fin] CED
<form>
<orth>pinna</orth>
<pron>("pIn@)</pron>
</form>
<gramGrp>
<pos>n.</pos>, </gramGrp>
<form type="inflected">
<number>pl.</number>
<form>
<orth type="lat" extent="part">-nae</orth>
<pron extent="part">(-ni:)</pron>
</form> or <orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">1. <def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">2. <usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">3. <xr type="syn">
<lbl>another name for</lbl>
<ref target="#auricle.2">auricle (sense 2).</ref>
</xr>
</sense>
<etym>[<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>,
<gloss>fin</gloss>]</etym>
</entry>
<entry xml:id="auricle.2">
<form>
<!-- .... -->
</form>
</entry>
A somewhat simplified encoding of the editorial view of this entry might exploit the fact that rendition text is often systematically recoverable. For example, parentheses consistently appear around pronunciation in this dictionary, and thus are effectively implied by the start- and end-tags for pron.31 In such an encoding, removing the tags should exactly reproduce the sequence of characters in the source, minus rendition text. The original character sequence can be recovered fully by replacing tags with any rendition text they imply.
- parentheses appear around pron elements
- commas appear before inflected forms
- the word ‘or’ appears before alternate forms
- brackets appear around the etymology
- full stops appear after pos, inflection information, and sense numbers
- senses are numbered in sequence unless otherwise specified using the global n attribute
<form>
<orth>pinna</orth>
<pron>"pIn@</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<form type="inflected">
<number>pl</number>
<form>
<orth type="lat" extent="part">-nae</orth>
<pron extent="part">-ni:</pron>
</form>
<orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">
<def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
<usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
<xr type="syn">
<lbl>another name for</lbl>
<ref>auricle (sense 2).</ref>
</xr>
</sense>
<etym>
<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>
When rendition text is omitted, it is recommended that the means to regenerate it be fully documented, using the tagUsage element of the TEI header.
If rendition text is used systematically in a dictionary, with only a few mistakes or exceptions, the global attribute rend may be used on any tag to flag exceptions to the normal treatment. The values of the rend attribute are not prescribed, but it can be used with values such as no-comma, no-left-paren, etc. Specific values can be documented using the rendition element in the TEI header.
This irregularity can be recorded thus:biryani or biriani %bIrI"A:nI) any of a variety of Indian dishes … [from Urdu]
<form>
<orth>biryani</orth>
<orth>biriani</orth>
<pron rend="noleftparen">%bIrI"A:nI</pron>
</form>
<def>any of a variety of Indian dishes … </def>
<etym>from <lang>Urdu</lang>
</etym>
</entry>
9.5.2 Lexical ViewTEI: Lexical View¶
If the text to be interchanged retains only the lexical view of the text, there may be no concern for the recoverability of the editorial (not to speak of the typographic) view of the text. However, it is strongly recommended that the TEI header be used to document fully the nature of all alterations to the original data, such as normalization of domain names, expansion of inflected forms, etc.
- reorganizing the order of elements in an entry to show their
relationship, as in
where in a strictly lexical view one might wish to group ‘clem’ and ‘clam’ with their respective inflected forms.clem (klEm) or clam vb. clems, clemming, clemmed or clams, clamming, clammed CED
- splitting an entry into two separate entries, as in
For some purposes, this entry might usefully be split into an entry for ‘celibacy’ and a separate entry for ‘celibate’.celi.bacy /"selIb@sI/ n [U] state of living unmarried, esp as a religious obligation. celi.bate /"selIb@t/ n [C] unmarried person (esp a priest who has taken a vow not to marry). OALD
- abbreviated forms have been silently expanded
- some forms have been moved to allow related forms to be grouped together
- the part of speech information has been moved to allow all forms to be given together
- the cross-reference to ‘auricle’ has been simplified
<form>
<orth>pinna</orth>
<pron>"pIn@</pron>
<form type="inflected">
<number>pl</number>
<form>
<orth type="lat">pinnae</orth>
<pron>'pIni:</pron>
</form>
<orth type="std">pinnas</orth>
</form>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
<usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
<xr type="syn">
<ptr target="#auricle.2"/>
</xr>
</sense>
<etym>
<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>
9.5.3 Retaining Both ViewsTEI: Retaining Both Views¶
It is sometimes desirable to retain both the lexical and the editorial view, in which case a potential conflict exists between the two. When there is a conflict between the encodings for the lexical and editorial views, the principles described in the following sections may be applied.
9.5.3.1 Using Attribute Values to Capture Alternate ViewsTEI: Using Attribute Values to Capture Alternate Views¶
If the order of the data is the same in both views, then both views may be captured by encoding one ‘dominant’ view in the character data content of the document, and encoding the other using attribute values on the appropriate elements. If all tags were to be removed, the remaining characters would be those of the dominant view of the text.
The attribute class att.lexicographic is used to provide attributes for use in encoding multiple views of the same dictionary entry. These attributes are available for use on all elements defined in this chapter when the base module for dictionaries is selected.
- att.lexicographic define el conjunto de atributos globales posibles para los elementos del conjunto de etiquetas base para diccionarios
norm (normalizado) proprorciona de manera normalizada información dada en el texto fuente de manera no normalizada split proporciona una lista de valores de abertura para una forma fusionada
- att.lexicographic define el conjunto de atributos globales posibles para los elementos del conjunto de etiquetas base para diccionarios
orig (original) da la serie original o la serie vacía cuando el elemento no aparece en el texto fuente mergedIn proporciona una referencia a otro elemento, donde el original aparece como una forma combinada.
- att.lexicographic define el conjunto de atributos globales posibles para los elementos del conjunto de etiquetas base para diccionarios
opt (facultativo) indica si el elemento es opcional o no.
<orth>delay</orth>
<form type="inflected">
<orth norm="delayed" extent="part">-ed</orth>
<tns norm="pst,pstp"/>
</form>
<form type="inflected">
<orth norm="delaying" extent="part">-ing</orth>
<tns norm="prsp"/>
</form>
</form>
<orth>delay</orth>
<form type="inflected">
<orth orig="-ed">delayed</orth>
<tns orig="">pst</tns>
<tns orig="">pstp</tns>
</form>
<form type="inflected">
<orth orig="-ing">delaying</orth>
<tns orig="">prsp</tns>
</form>
</form>
With the editorial view dominant, this entry might begin thus:thyr(é)ostimuline [tiR(e)ostimylin] …
<orth split="thyrostimuline, thyréostimuline">thyr(é)ostimuline</orth>
<pron split="tiRostimylin, tiReostimylin">tiR(e)ostimylin</pron>
</form>
<orth xml:id="dic-o1" orig="thyr(é)ostimuline">thyrostimuline</orth>
<pron xml:id="dic-p1" orig="tiR(e)ostimylin">tiRostimylin</pron>
</form>
<form>
<orth mergedIn="#dic-o1">thyréostimuline</orth>
<pron mergedIn="#dic-p1">tiReostimylin</pron>
</form>
<orth next="#dict-o2" xml:id="dict-o1">thyr</orth>
<orth
next="#dict-o3"
prev="#dict-o1"
xml:id="dict-o2"
opt="true">é</orth>
<orth prev="#dict-o2" xml:id="dict-o3">ostimuline</orth>
<pron next="#dict-p2" xml:id="dict-p1">tiR</pron>
<pron
next="#dict-p3"
prev="#dict-p1"
xml:id="dict-p2"
opt="true">e</pron>
<pron prev="#dict-p2" xml:id="dict-p3">ostimylin</pron>
</form>
Note that this transcription preserves both the lexical and editorial views in a single encoding. However, it has the disadvantage that the strings corresponding to entire words do not appear in the encoding uninterrupted, and therefore complex processing is required to retrieve them from the encoded text. The use of the opt attribute is recommended, however, when long spans of text are involved, or when the optional part contains embedded tags.
pas.tel /"pastl US: pa"stel/ n 1 (picture drawn with) coloured chalk made into crayons. 2… OALD
<def>coloured
chalk made into crayons</def>
<def>picture drawn with coloured chalk
made into crayons</def>
</sense>
<def next="#d2" xml:id="d1" opt="true">picture drawn
with</def>
<def prev="#d1" xml:id="d2">coloured chalk made into
crayons</def>
</sense>
9.5.3.2 Recording Original Locations of Transposed ElementsTEI: Recording Original Locations of Transposed Elements¶
The attributes described in the previous section are useful only when the order of material is the same in both the editorial and the lexical view. When the two views impose different orders on the data, the standard linking mechanisms may be used to show the original location of material transposed in an encoding of the lexical view.
- att.lexicographic define el conjunto de atributos globales posibles para los elementos del conjunto de etiquetas base para diccionarios
opt (facultativo) indica si el elemento es opcional o no.
pinna
("pIn@)
n., pl. -nae (-ni:) or -nas. CED
<orth>pinna</orth>
<pron>'pIn@</pron>
<anchor xml:id="p01"/>
<form type="inflected">
<number>pl</number>
<form>
<orth extent="part">-nae</orth>
<pron extent="part">-ni:</pron>
</form>
<orth extent="part">-nas</orth>
</form>
</form>
<gramGrp>
<pos location="#p01">n</pos>
</gramGrp>
9.6 Unstructured EntriesTEI: Unstructured Entries¶
The content model for the entry element provides an entry structure suitable for many average dictionaries, as well as many regular entries in more exotic dictionaries. However, the structure of some dictionaries does not allow the restrictions imposed by the content model for entry. To handle these cases, the entryFree and dictScrap elements are provided to support much wider variation in entry structure. The dictScrap element offers less freedom, in that it can only contain phrase level elements, but it can itself appear at any point within a dictionary entry where any of the structural components of a dictionary entry are permitted. As such, it acts as a container for otherwise anomalous parts of an entry.
The entryFree element places no constraints at all upon the entry: any element defined in this chapter, as well as all the normal phrase-level and inter-level elements, can appear anywhere within it. With the entryFree element, the encoder is free to use any element anywhere, as well as to use or omit grouping elements such as form, gramGrp, etc.
h="demigod"> <hwd>demi|god</hwd> <pr> <ph>"demIgQd</ph> </pr> <hps
ps="n"> <hsn> <def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman,
eg<cf>Hercules</cf> <pr> <ph>"h3:kjUli:z</ph> </pr> </def> </hsn>
</hps> </ent>
<form>
<orth>demigod</orth>
<hyph>demi|god</hyph>
<pron>"demIgQd</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman, eg
<mentioned>Hercules</mentioned>
</def>
<pron>"h3:kjUli:z</pron>
</entryFree>
biryani or biriani
(%bIrI"A:nI)
any of a variety of Indian dishes…[from Urdu] CED
<orth>biryani</orth> or <orth>biriani</orth>
<pron>(%bIrI"A:nI)</pron>
<def>any of a variety of Indian dishes …</def>
<etym>[from <lang>Urdu</lang>]</etym>
</entryFree>
<dictScrap>
<orth>biryani</orth> or <orth>biriani</orth>
<pron>(%bIrI"A:nI)</pron>
<def>any of a variety of Indian dishes …</def>
<etym>[from <lang>Urdu</lang>]</etym>
</dictScrap>
</entry>
9.7 The Dictionary ModuleTEI: The Dictionary Module¶
- Módulo dictionaries: Dictionaries
- Elementos definidos: case colloc def dictScrap entry entryFree etym form gen gram gramGrp hom hyph iType lang lbl mood number oRef oVar orth pRef pVar per pos pron re sense stress subc superEntry syll tns usg xr
- Clases definidas: att.entryLike att.lexicographic model.entryLike model.formPart model.gramPart model.morphLike model.ptrLike.form
↑ Contenidos « 8 Transcriptions of Speech » 10 Manuscript Description