att.linguistic

att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically w and pc in the analysis module. [17.4.2 Lightweight Linguistic Annotation]
組件 analysis — Simple Analytic Mechanisms
成員 pc w
屬性
lemma⚓︎ 指出在字典中該字的詞條形式。
狀態 非必備的
資料類型 teidata.text
<w type="動詞lemma="hit">hitt<m type="字尾">ing</m>
</w>
lemmaRef⚓︎ provides a pointer to a definition of the lemma for the word, for example in an online lexicon.
狀態 非必備的
資料類型 teidata.pointer
<w type="verblemma="hit"
 lemmaRef="http://www.example.com/lexicon/hitvb.xml">
hitt<m type="suffix">ing</m>
</w>
pos⚓︎ (part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).
狀態 非必備的
資料類型 teidata.text

The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS).

<s>
 <w pos="PPER">Wir</w>
 <w pos="VVFIN">fahren</w>
 <w pos="APPR">in</w>
 <w pos="ART">den</w>
 <w pos="NN">Urlaub</w>
 <w pos="$.">.</w>
</s>

The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace).

<p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>         

The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially.

<p>
 <w pos="PPIS2">We</w>
 <w pos="VBR">'re</w>
 <w pos="VVG">going</w>
 <w pos="II">on</w>
 <w pos="NN1">vacation</w>
 <w pos="II">to</w>
 <w pos="NP1">Brazil</w>
 <w pos="IF">for</w>
 <w pos="AT1">a</w>
 <w pos="NNT1">month</w>
 <pc pos="!">!</pc>
</p>
msd⚓︎ (morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).
狀態 非必備的
資料類型 teidata.text
<ab>
 <w pos="PPERmsd="1.Pl.*.Nom">Wir</w>
 <w pos="VVFINmsd="1.Pl.Pres.Ind">fahren</w>
 <w pos="APPRmsd="--">in</w>
 <w pos="ARTmsd="Def.Masc.Akk.Sg">den</w>
 <w pos="NNmsd="Masc.Akk.Sg">Urlaub</w>
 <pc pos="$.msd="--">.</pc>
</ab>
join⚓︎ when present, provides information on whether the token in question is adjacent to another, and if so, on which side.
狀態 非必備的
資料類型 teidata.text
合法的值是:
no
the token is not adjacent to another
left
there is no whitespace on the left side of the token
right
there is no whitespace on the right side of the token
both
there is no whitespace on either side of the token
overlap
the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream

The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.

<s>
 <pc join="right">"</pc>
 <w join="left">Friends</w>
 <w>will</w>
 <w>be</w>
 <w join="right">friends</w>
 <pc join="both">.</pc>
 <pc join="left">"</pc>
</s>

Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.

The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.

<p>
 <w pos="PNP">We</w>
 <w pos="VBBjoin="left">'re</w>
 <w pos="VVG">going</w>
 <w pos="PRP">on</w>
 <w pos="NN1">vacation</w>
 <pc pos="PUNjoin="left">.</pc>
</p>