The Abbot Framework
(Stephen Ramsay)
Conversion from one encoding scheme to another—even within the confines of a reasonably predictable encoding scheme like TEI—is never an easy job. The nature of the target text may require deviations from the TEI standards, and local circumstances related to storage, delivery, or rendering may require further extensions. It is therefore seldom possible to convert disparate text collections into one interoperable representation simply by executing a stylesheet.
Our schema harvesting method is therefore embedded in a larger system—called Abbot— that allows the user to make programmatic adjustments to both the conversion process and the converted files. Ideally, Abbot acts as a “one button” tool that encapsulates the various steps from target collection to TEI-A; in the case of valid, unextended TEI, we’re able to give the user (or “curator”) a simple interface for converting the files. In more complex circumstances, however, Abbot allows for fine-grained control over the conversion process.
Such flexibility is facilitated using jBPM—a workflow management framework developed under the aegis of the JBoss project. JBPM allows us to model the steps of the conversion process within an XML file using what is sometimes called “graph oriented programming.” By expressing the workflow as an abstraction above the actual Java objects responsible for the conversion, we are able to facilitate state saves, threading, rollback, and many of the other features commonly found in relational database management systems. In addition to making the management of the workflow more transparent, this also allows easy integration with other systems (like Fedora, relational databases, and Web services frameworks). The core language(s) for representing workflows— WS-BPEL (Web Services Business Process Execution Language) and PDL (Process Description Language) are themselves standardized, which allows further possibilities for interoperability with other tools. Since the workflow language is in XML, we imagine that graphical, and even Webbased user interfaces to Abbot would not be difficult to create.
In developing Abbot, we’ve been conscious of the fact that our most likely user is somewhere between a text archive end user and a Java developer (both in terms of needs and skill sets). We imagine this user as a kind of “curator” who likely possesses deep familiarity with XML technologies and simple scripting frameworks, but who is uncomfortable with advanced programming. In a sense, Abbot tries to abstract this user away from the code while still allowing him or her to manipulate the texts and the processes with the same flexibility afforded by ordinary scripting languages.
At the time of this writing, Abbot is embedded within a larger workflow intended to facilitate TEI-A conversions for the MONK Project. On the one hand, this has given us an opportunity to anticipate the role of Abbot within a complex environment. However, we also imagine Abbot as a standalone tool that can be used by those curators who would like to create interoperable versions of their collections for search and analysis.