Notes on the Encoding of Linguistic Analysis
D. Terence Langendoen
Department of Linguistics
University of Arizona
Tucson, AZ 85721 USA
E-mail: langendt@arizvm1 (bitnet)
Phone: (602) 621-6898
18 January 1990
Very preliminary version
1
SAMPLE MARKUP POSSIBILITIES FOR THE ENGLISH WORD
'UNPACKED'
Assuming a fully specified lexicon and (word-formation) grammar, here
are schematic markups for the three interpretations of the English word
'unpacked', which assumes that none of these are entered in the lexicon,
but that there are entries for the following: 'pack' 'unpack', 'un' (two
different ones), 'ed' (two different ones).
1.
unpacked
The category tag (here ) also contains attributes identifying
its argument structure and selectional restrictions.
Rule un1r is the rule for forming 'negative adjectives' from adjectives.
The rule has 2 parts, the prefix identified in the lexicon as un1l, and
something which itself the result of an analysis.
The lexical item un1l is prefixed to an adjective which in this case is
composed by the rule ed2r, which suffixes the form lexically identified
as ed2l to the lexical entry identified as pack3l the entry in fact may
be a subentry under the lemma 'pack' to be identified by a mechanism
such as Gary Simons suggests in AIW12.
No attributes are provided here though they could be.
2.
Rule ed3r forms passive past participles from verb stems I assume for
purposes of illustration that the rule is distinct both from the rule
that forms active past participles and from the rule that forms adjec-
tives from verbs with the same morphology.
I assume for this illustration that the verb 'unpack' is listed directly
in the lexicon and does not have to be formed by a morphological rule
from the prefix un2l and the verb stem pack3l.
3.
unpacked
I use parentheses to enclose list items, pending clarification as to the
correct SGML syntax to use in this situation.
The three preceding analyses could be provided together with or without
a ranking provided. Presumably, id's should be provided for each of the
constituent analysis tags. If the ranking is omitted, then it is
assumed that the alternatives are equally ranked.
4.
unpacked
is an embedded analysis-tag.
We now give a variant of "1. " in which we flatten the analysis; that
is, we provide an analysis in which we identify the rules and morpholog-
ical elements that combine, but do not indicate the order of combina-
tion.
5.
unpacked
If we now eliminate reference to rules, we have a representation which
just indicates the morphological parts. These parts could in fact be
identified directly as character strings, as in:
6.
unpacked
un
pack
ed
Very preliminary version