David J. Birnbaum (djbpitt+@pitt.edu)
2004-01-24
This document describes limitations in or problems with the TEI manuscript-description model and proposes solutions. "TEI-MS" here refers to the synthesis of MASTER and the proposal of the original TEI manuscript-description Work Group, as developed by the Manuscript Task Force in Reykjavik in September 2003. See also the original TEI-MMSS Work Group report and the MASTER documentation (both available on line at the TEI web site, although the MASTER documentation is out of sync with the MASTER DTD), as well as two documents prepared by Andrej Bojadžiev, one outlining the modified TEI system used in the Sofia Repertorium project and one contrasting the Repertorium DTD and MASTER.
<msDescription>
can appear"form"
attribute vs element<msItem>
Problem: Paragraph data is difficult to process automatically. Paragraph typing (using the new "topic" attribute) is helpful, but structured alternatives are still required by some users.
Proposed Solution: Retain (p+)
models for subelements, but provide structured alternatives that do not
require <p>
wrappers.
<msDescription>
can appearProblem: TEI-MS, following MASTER, treats <msDescription>
as a type of <div>
. Andrej notes that this has the undesirable side-effect of permitting <msDescription>
inside elements where it probably does not belong: <argument>
, <q>
, <sic>
, <corr>
, <add>
, <item>
, <note>
, and <epigraph>
. The Repertorium defines <msDescription>
(which it calls <msDesc>
) as a type of %component;
, which places it in the <body>
(only), but does not permit it to appear inside these other elements.
The Repertorium DTD modifications do not follow the TEI parameter-entity
guidelines, but they do appear to restrict the location of the <msDescription>
element in a useful way.
Proposed Solution: TEI-MS should permit <msDescription>
inside <sourceDesc>
or <body>
, and should do so using the TEI guidelines for extension, but should
not permit it inside the additional elements listed above.
"form"
attribute vs elementProblem: Repertorium defines a "form" attribute on <msDesc>
(= TEI-MS <msDescription>
) with possible values: (codex | roll | leaf | fragleaf | fragCodex | cutting)
. This list probably a reasonably complete inventory of the types of
paper and parchment documents one encounters, although it is not applicable to
birchbark, wax, wood, stone, etc. The Repertorium attribute element is
misplaced because the shape of the book is logically part of the physical
description, and should be located accordingly.
MASTER proposes a <form>
element inside <physDesc>
with paragraph content. The Repertorium approach does not permit such
MASTER examples as "Three vellum pieces sewn together to make a roll" or "Seven
volumes," but one cannot search meaningfully for values like that anyway, and
if that exact wording is desirable for rendering purposes in a prose catalog,
it might be included in the <head>
or a leading <msLooseDesc>
.
Proposed Solution: Change the MASTER <form>
element to a CDATA attribute on <support>
(or some other part of <physDesc>
), using values from the Repertorium proposal (except that lower-case "fragleaf"
should be replaced by camel-case "fragLeaf"
. This attribute value is not meaningful for materials other than
parchment and paper, but using fixed values where it is applicable will make it
possible to find all types of fragments with simple searches, which is not
possible with the MASTER <form>
element. The Repertorium "unity"
attribute on <msDesc>
is not needed because the unity of the manuscript can be determined
automatically from the subelements (that is, from whether it has one or more <msPart>
elements).
Problem: TEI-MS provides no structured mechanism for describing parchment (e.g., Gregory Rule), ruling and pricking (e.g., Gilissen, Leroy), format (e.g., a list of fixed attribute values for quarto, etc.), and watermarks. All of these should have structured alternatives, ideally with fixed attribute values (rather than PCDATA element content or CDATA attribute values), not so much for searching or rendering (although they might be useful here) as to enable users to search for correspondences (e.g., are manuscrips with certain ruling patterns correlated with particular scribes or scriptoria?).
Proposed Solution: Introduce attributes with a set of possible values to describe the features mentioned above. Specifically:
Use TEI-MS <collation><formula> ... </formula></collation>
to describe the arrangement of folios into quires according to standard
practice. Give <formula>
an attribute "type"
(or something similar) that allows one to specify values like "standard"
(for the traditional "71 I-LXX8 + LXXI5
(-6-7-8)") or "gregoryRule"
(see Andrej's Repertorium documentation for examples).
TEI-MS <layout>
has attributes "columns"
, "ruledLines"
, and "writtenLines"
. Retain these, but add optional <ruling>
and <pricking>
subelements. The <ruling>
element would have <formula>
content (or <formula>
within <p>
) with an attribute "type"
on the <formula>
that has CDATA content (suggested values: Gilissen, Leroy, other). The <pricking>
element has paragraph content (since no formal descriptions of pricking
are in common use).
Watermarks might take the form of <watermark>
elements with %phrase.seq;
contents and with CDATA attributes for standardized pointers to
watermark albums (e.g., Briquet,
Лихачев) and to item numbers
within the albums (note, though, that a single pointer to, for example,
Briquet, might point to a set of specific items, and this needs to be encoded
so that it can be parsed by XSLT or similar processes). The element content is
needed for descriptions like "This watermark is similar to items 1, 2, and 3 in
Briquet, although it does not match any of them exactly in size," or to
associate specific watermarks with specific page ranges within the manuscript.
Note also that users might wish to associate a specific watermark in the
manuscript with references in several standard albums, and the model needs to
support this. (For example, wrapping a single <watermark>
element that points to an album and an item around a description of a
watermark in the manuscript doesn't allow that watermark to point to
alternative items in alternative albums.)
The Repertorium approach, which is synchronized with the work of the
Watermark Initiative, includes within <watermark>
several subelements (<motif>
, <countermark>
, <disposition>
), each with attributes. These should be incorporated into
TEI-MS.
The "use" attribute on Repertorium <parchmentDesc>
and <paperDesc>
is probably superfluous; it can be described in prose (as long as an
appropriate element with data content is available), and is unlikely to be used
for searching, special rendering, or quantitative analysis.
Problem: The Repertorium provides a "source"
attribute on <catalogueStmt>
(the Repertorium counterpart to TEI-MS <msIdentifier>
) with values (devisu | microform | description | edition | other)
. This attribute is misplaced (it is part of the source information
about the description, rather than a feature of the manuscript), but it serves
to document in a standard way the types of resources used in constructing
descriptions. Simplifying retrieval according to this information is helpful
for scholars who may need to distinguish in a general way the reliability of
the source, perhaps to determine which manuscripts merit reevaluation.
Proposed Solution: Add a comparable attribute to
TEI-MS, but locate it as an attribute value on the <source>
element inside <recordHist>
. Retain the <source>
element with prose content to specify the source more precisely, e.g.,
to provide publication information about an edition. Divide the "edition"
attribute value into "typesetEdition"
and "facsimileEdition"
, which is a crucial distinction.
Problem: Reportorium contains as part of the <catalogueStmt>
(the counterpart to the TEI-MS <msIdentifier>
) a <manuscriptName>
element, with an attribute "codexType"
that have values (general | individual)
. This is intended to make it possible to identify, for example, a
miscellany (general) and a specific miscellany type (individual). The <manuscriptName>
element is misplaced because it is part of the intellectual content of
the manuscript, rather than a geographic identifier comparable to repository or
signature (shelfmark). The TEI-MS <title>
element inside <msItem>
fulfills this function, but needs an attribute to distinguish different
levels of titles. The current attribute "type"
with values "uniform"
and "supplied"
is not adequate for representing uniform or supplied titles of varying
specificity.
Proposed Solution: Enhance the attribute values
available for <title>
inside <msItem>
to make it possible to specify both general titles (e.g., "Patericon")
and narrower ones (e.g., "Sinai Patericon"). The values discussed on the TEI-MS
mailing list, (generic | distinctive)
, should be suitable for this purpose.
Possible Complication: In some cases more than two
levels of titles are required, e.g., Mineja (a genre term for a book
involving readings arranged by month), Služebnaja Mineja (a
liturgical type of Mineja), and Služebnaja Mineja na
sentjabr′ (a Služebnaja Mineja for the month of September).
For some purposes, such as searching, the last two of these could both be
tagged as "distinctive"
, although for formatting a print description it might be necessary to
distinguish three levels in markup.
<msItem>
Problem: The Repertorium divides the description of the
intellectual content of the manuscript into <manuscriptContentDesc>
, which pertains to the manuscript as a whole, and <articleContentDesc>
, which pertains to constituent articles. Both Reportorium and TEI-MS
permit <msItem>
(or its equivalent) to nest. This makes <manuscriptContentDesc>
unnecessary, since the intellectual content of a manuscript can be
described as a parent <msItem>
, whose <msItem>
children then correspond to the constituent texts.
The content model for <manuscriptContentDesc>
in Repertorium is (overview*, numberTexts*, source*, translation*, protograph*, antigraph*, apograph*, litRedaction*, churchCal?, sampleText*, listBibl?, bibl?)
. TEI-MS now includes a <filiation>
element (not originally present in MASTER) with an attribute value
capable of distinguishing protographs, antigraphs, and apographs, but not the
other subelements. The Reportorium model includes two attributes: "type"
(original | compilation | translation)
and "style"
(narrative | non-narrative)
. "Style"
is probably of limited utility (one could, for example, have chosen
hymnographic and non-hymnographic, liturgical and non-liturgical, religious and
non-religious, etc.), but "type"
may be useful for locating all of the translations quickly, but it may
not be needed alongside the <translation>
subelement.
Proposed Solution: The "type"
attribute should be incorporated (on <msItem>
) with the Repertorium values.The Repertorium subelements translation
, litRedaction
, and churchCal
are needed, but they do not pertain to <filiation>
. To accommodate them, add two subelements:
<transmission>
element with a CDATA attribute "type"
with suggested values (translation | litRedaction)
."churchCal"
attribute (or something similar) on <msItem>. A formalized system
for representing the calendar should be proposed as a framework for suggested
values.Problem: Repertorium uses <noteDesc>
to correspond to TEI-MS <additions>
. Repertorium <noteDesc>
contains <noteItem>
elements, which have structured content comparable to the description
of other manuscript material (in TEI-MS terms, this structured content pertains
to the intellectual content of the manuscript [<msItem>
]or to handwriting [<msWriting>
]). TEI-MS <additions>
contains only paragraphs, which makes it impossible to search for
marginalia with the same content-related or writing-related features as regular
text. Because one might wish to find correspondences in marginalia (whether
based on contents or form), a structured alternative to paragraphs would be
useful.
Proposed Solution: Uncertain. Allow <msItem>
and <msWriting>
inside individual items within <additions>
? Document a mechanism for linking between a general <msWriting>
element and those items? The TEI-MS strategy of separating notes in the
manuscript (<additions>
) from editorial notes (<note>
) should be retained; what is lacking is a structured way to describe an
"addition".
Problem: The order of constituent articles in a
miscellany or other complex manuscript would appear to be inferable from the
order of the <msItem>
elements. Such an assumption yields incorrect results in the case of
manuscripts that have been rearranged during rebinding. The actual (current)
order needs to be included because it is part of the description of the
manuscript as it is, but a structured way of representing the original order is
also required for automated analysis (for example, the order of the contents
when the manuscript was originally created is the only order that matters for
identifying a possible protograph or antigraph).
Proposed Solution: Because MASTER (whether wisely or not) chose to privilege current physical codicology over
textual description, the order of content items in the manuscript at the time of description, however accidental that order may be, is primary. Accordingly,
the order of <msItem>
elements should continue to record the order of the constituent texts
as they appear in the manuscript currently. <msItem>
should have an optional attribute with a name like "originalLocation"
and a value that indicates the original ordinal place of the article
where that differs from the current place.
Problem: TEI-MS provides no structured way of
describing markings of this type. Repertorium provides a <markings>
element with a "form"
attribute and values (catchword | signature | numbers | figure | mixed | other)
and %phrase.seq;
contents. There is also a "presence"
attribute with values (all | some)
and an "orientation"
attribute with values (horizontal | up | down)
. In general the marks in question were added after the manuscript was
originally composed, which means that similarities in marking patterns between
manuscripts do not suggest common origin, but they might suggest, for example,
that the manuscripts were rebound in the same location.
Proposed Solution: Introduce an optional and repeatable
<markings>
element as part of <collation>
. The Repertorium attribute values are probably superfluous insofar as
they are not needed for searching, rendering, or quantitative analysis. With
appropriate typing, this element might be able to subsume the functions of the
separate Repertorium <foliation>
and <pagination>
elements.
Problem: Repertorium has richly structured models for the description of decoration and binding. TEI-MS has only loosely structured or unstructured models, and these are not capable of identifying manuscripts in a corpus that share decorative or binding features.
Proposed Solution: Use the Repertorium models as a starting point for creating formalized alternatives to paragraph descriptions of decoration and binding. Elisaveta Musakova might be invited to help develop the decoration model.
Problem: Paleographic and orthographic descriptions are
a standard feature of manuscript description and structured representations are
needed for searching, rendering, and quantitative analysis. On the other hand,
these features are specific to different languages and writing systems, and
cannot be standardized across cultural traditions in any general way. TEI-MS
contains optional <palaeography>
, <orthography>
, and <morphology>
subelements of <handDesc>
; this is the right location (they vary with scribe), but 1) they do not
have structured content, and 2) if <morphology>
is to be included, <phonology>
should probably also be available.
Proposed Solution: Retain the TEI-MS elements (adding <phonology>
) with paragraph content. Invite expert constituencies to propose
structured alternatives for specific traditions. The Reportorium can supply a
structured model for Cyrillic paleography and orthography. The structured
alternatives would be part of the TEI DTDs as pizza toppings (or, I suppose,
toppings for toppings, since they would be meaningful only if one had selected
the manuscript description topping).
The contents of the paleographic description in Repertorium includes <letterForm>
, <cadels>
, <crypto>
, and <musicNotation>
. The last of these might go elsewhere in TEI-MS, the the first three
provide a generic superstructure that might be useful independently of
tradition.