Core Elements in Text Representation: A Point of View David Chesnutt Scholars in the humanities are most often engaged in preparing monographs, journal articles, editions, and textual materials to be used in their classrooms. While most scholars today create "electronic" versions of these texts, they are accustomed to thinking only of what they must do in their particular word processing programs to achieve the desired format for their text. The levels of ambiguity and inconsistency in these author/editor texts do not seem to be of great concern to most scholars as long as the texts convey the information they wish to present. The Association of American Publishers recognized the value of electronic manuscripts as a means of lowering typesetting costs early in this decade. Major textbook publishers were the first to develop in-house systems designed to produce typesetting files. At about the same time, a number of scholarly editions in the States followed suit. Larger university presses followed suit and many other presses (both private and public) experimented with electronic files produced by their authors. The results were mixed but out of that experience came the AAP effort to standardize the coding of electronic files via an SGML markup scheme. By and large, the AAP standard has had little impact to date. Among all the scholarly editions in history and literature that I am familiar with (probably most of the major projects in the States), none uses the AAP tagging scheme. Even among the publishers themselves, AAP has not "caught on." Part of this lack of enthusiasm probably stems the AAP documentation itself which is sometimes difficult to follow. But the major flaw of the AAP scheme from an end-user point of view is that it asks too much. It is too particularized--geared too much to the publisher who is the beneficiary of the potential savings; geared too little to the inexperienced academic. Yet at the same time, the AAP coding does not provide a tagging scheme for one of the most elementary textual elements which commonly occurs in our texts--poetry. (Nor does it provide a tagging scheme for some of the more specialized problems in the publication of scholarly editions--letters and documents of various types, cancellations, interlineations, textual variants. But that's another issue because these features are not common to most texts in the humanities.) The goals of the TEI are to provide a tagging scheme which will facilitate the interchange of text and which will allow tagging at various levels of sophistication. To accomplish those goals we must provide guidelines which will not intimidate the novice, yet which are comprehensive enough to accomodate the needs of scholars who wish to tag their data for sophisticated presentation or analysis. If the TEI is to gain the kind of widespread acceptance we hope it will, it seems to me that we must provide a simplified "cookbook" for the novice and a comprehensive reference manual for the more sophisiticated applications. My concern here is with the cookbook and the ingredients necessary for a palatable stew. Core Elements Within any reading text, we tend to discriminate between the elements visually as a means of conveying information to the reader. More often than not, we use the conventions that printers have used for centuries. We center a heading, put it in larger type, make it boldface, separate it with vertical space above and below. In short, we emulate the conventions of the printed page. With an electronic text, we usually have a mix of the old with the new. For example, we might display the words we have marked for emphasis with a different background color instead of an italic font. Regardless of our media or our conventions, the important aspect is simply to mark or tag those elements in a way which others can easily recognize and adapt to their own uses. The following outline includes some of the most common elements which influence the way we "see" a text. body text titles/headings paragraph poetry lists/tables block quotation titles/headings paragraph poetry lists/tables block quotation annotation titles/headings paragraph poetry lists/tables block quotation Steve DeRose uses the term "text container" to describe some of these elements which is a useful way of separating them from those elements that occur within the text (e.g., sections of text marked for emphasis, hyphen problems, occurrences of foreign languages, etc.). The internal elements are probably no less critical, but I want to confine my remarks here to the "containers" which seem to be most central or common to texts in the humanities. One of the problems I see in providing tag sets for these core elements is the recurrence of the same elements in different settings. These elements pop up all over the place and are commonly input by our colleagues without any discrimination as to where they occur in the text. On the other hand, our current textual conventions of presentation in either print or electronic format require discrimination. In a printed work, the elements within the body text are typically set in larger type than those in the annotation. To achieve that effect using the current generation of commercial typesetting software requires explicit tagging. With the exception of paragraphs, all of the other elements have to have unique tag sets. A block quotation in the annotation cannot be tagged with the same tags used for a block quotation in the body text. While I have no doubt that our colleagues could easily understand the necessity of explicit tagging, I don't think we would win many converts to a tagging scheme of that type. Most would simply throw up their hands. And quite frankly, if my own project did not have the software to insert the explicit tags required for typesetting, we would not be engaged in preparing typesetting files today. Unfortunately our software is designed only for the kinds of documents we commonly publish. But the principle inherent in our software is one worth thinking about because we use what might be called "context tags" to discriminate between textual elements that commonly occur in different circumstances. The use of context tags could provide a way of simplifying the tagging of core elements. For example, you might have a tagging structure that looked something like the outline below.
text