TEI-Analytics and the MONK Project

(Martin Mueller)

MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study. It supports both micro analyses of the verbal texture of an individual text and macro analyses that let you locate texts in the context of a large document space consisting of hundreds or thousands of other texts. Shuttling between the “micro” and the “macro” is a distinctive feature of the MONK environment, where you may read as closely as you wish but can also practice many forms of what Franco Moretti has provocatively called “distant reading.”

Metadata are at the heart of the radical divide and conquer strategy of MONK, an acronym for "Metadata Offer New Knowledge." For every document in a MONK environment there are explicitly recorded metadata at the top level (bibliographical data), at the bottom level of individual word occurrence (lexical, morphological, and syntactic data), and at the mid-level of discursive organization (chapters, scenes, stanzas, etc). The query potential of the resultant document space depends on the end user's ability to perform operations based on arbitrary combinations of metadata from these different levels.

TEI-Analytics has been our tool for expressing the triple-decker structure of metadata in a large, heterogeneous, and diachronic document space that includes texts from different collections. From one perspective, TEI-Analytics is a minor modification of the P5 TEI-Lite schema, with additional elements from the Linguistic Segment Categories to support morphosyntactic annotation and lemmatization. But while there is nothing very special about the schema as such, tricky choices and trade-offs emerge when you transform different flavors of older TEI P3 or P4 texts into a data format designed to maximize interoperability of texts within a large document space.

The Italian adage "traduttore traditore" comes to mind as well as Desdemona's "divided duty." Within the framework for which TEI-Analytics has been developed, conflicts will be resolved in favour of making it easier for end users and developers to achieve comparability of texts. To compare is to look for significant difference, but without comparability difference cannot be identified in the first place.

This bent of the project rests on the assumption that in the current intellectual and technical environment of text analysis coarse but consistent encodings across many texts in a heterogeneous document space have a greater scholarly pay-off than finely grained encodings in closely defined but not necessarily compatible environments. Compared with human reading, digitally assisted text analysis is more of a machete than a scalpel and likely to remain so for some time. But a lot of useful machete work remains to be done. Crude but fast, the digital machete can open many paths for slow but subtle readers to explore.

TEI

Members Meeting 2008

TEI-Analytics and the MONK Project