TEI Analytics: 3-paper panel
(Martin Mueller, Brian Pytlik-Zillig, Stephen Ramsay)
We present a TEI Analytics—a TEI subset designed to facilitate interoperability of text archives for the purpose of conducting large-scale text analysis operations on full text archives. Our panel covers three aspects of the project. Martin Mueller discusses the nature and development of the TEI Analytics schema, with its specific focus on data points for natural language processing. Brian Pytlik-Zillig’s fully automatic method for converting arbitrary TEI—and even non-TEI—documents to TEI-A uses schema harvesting, and so he presents his particular technique for using schemas to write stylesheets (that write stylesheets). Stephen Ramsay talks about Abbot: a tool designed to enclose Pytlik-Zillig’s schema harvester within a highly configurable framework, which also takes advantage of some novel (XML-based) software engineering techniques.
TEI Analytics grew out of the specific needs of the MONK Project, which endeavors to bring Web-based text analysis and visualization techniques to existing full-text archives. However, we believe that our basic approach can be used in other contexts where rendering is not the principal concern. We also suggest that the problem of interoperability might best be approached not through generalized down-tagging or some other one-size-fits-all solution, but through domain-specific subsets like TEI-A. There are also elements of our approach that may facilitate P4 to P5 migration in some contexts.
Throughout the discussion, we emphasize the general nature of our approach (as opposed to focusing on the narrow concerns of MONK), though we will also report on some of the ways in which we’ve used TEI-A to perform useful text analytical work.