TEI Special Interest Groups


Contents

In September, we posted a request for expressions of interest from those wishing to propose TEI Special Interest Groups in particular areas at the TEI Members Meeting. Somewhat to our surprise, by the time of the announced deadline for proposals we had received more than enough to fill the six slots allocated for presentation and discussion of these proposals: all are listed below.

The function of a SIG, as originally discussed at the last members meeting, is to provide a forum for people working a particular area, or with a specific set of concerns, to exchange opinions and build consensus. That might lead to any number of outputs, including specific training courses or documentation, proposals for extension or modification to the Guidelines, etc. A SIG is not a TEI workgroup but could lead to one being set up, or greatly contribute to its work, by providing consultation, field-trials, outreach etc.

Each SIG will be expected to produce a brief statement of the outcomes of its preliminary discussions: we hope to find time during the business meeting for this feedback and it will also be posted on the website. That's particularly important, since not everyone will be able to attend all the SIGs they might like to.

The TEI is a community-driven initiative: it is really up to convenors and members of the SIG to decide how best to accomplish its goals, and indeed to determine what those goals may be. The TEI Consortium offers a communications channel (via our web site and discussion lists) and a route into the technical procedure by which the Guidelines are developed and promoted (via the TEI Council), but how SIGs choose to use those facilities is really up to them. The Consortium's health depends on a well-informed and enthusiastic membership, of which SIGs are a major new manifestation.

Manuscript transcription and description

Elena Pierazzo, University of Pisa (Italy); Susan Schreibman, Maryland Institute for Technology in the Humanities (US); Edward Vanhoutte, Centre for Scholarly Editing and Document Studies (CTB), Royal Academy of Dutch Language and Literature (Belgium)

The problems of editing modern manuscripts, both literary and documentary, are substantially different from the problems and issues posed by editing medieval or early modern texts. As there is often a paucity of witnesses from the earlier period, much encoding is geared towards reconstructing what might have been a base or lemma text. While in the modern period, there is often a abundance and density of witnesses and intra-documentary layers of text, in which the point of the encoding is not to create a base text, but recreate a history of the text. In addition, the editor of modern manuscript editions is often faced with a bewildering array of document types that do not fit comfortably into current TEI practice, such as letters, fragments, and ephemera. The beginning and ending of a text may be fluid, and a single text may represent many periods of composition over time.

While there has been substantial work in the TEI community in augmenting the Critical Apparatus and Transcript of Primary Source tag sets to accommodate encoding medieval and early modern manuscripts (see particularly the Taskforce on Manuscript Description), much of this work cannot be applied to editing modern manuscripts. This SIG will thus explore a range of issues common to editing modern manuscripts, including 1) accommodating a variety of document types not intended for publication; 2) finding a more robust way of tracing a textual history, quite possibly via a theory such as genetic criticism; 3) integrating non-verbal material (such as sketches, doodles, or drawings) with the textual; 4) encoding time-sensitive information which may be textually or document-based; and 5) investigating how, using open source standards such as XSLT, to display these richly-encoded texts.

Human Language Technologies

Tomaz Erjavec, Laurent Romary, Nancy Ide

While the TEI, and its derivative CES have been extensively used for annotating corpora, TEI has been almost completely absent in the markup of other resources connected to Human Language Technologies, e.g. ontologies, lexical databases, grammatical formalisms, termbanks, etc. There are various reasons for this, ranging from the fact that the Guidelines simply do not offer tagsets for or chapters on certain resources (computational lexica, ontologies), that the current encoding practice has overtaken the proposal made in the Guidelines (termbanks), or simply that the HLT community is not aware of the existence of TEI.

The TEI LT SIG would attempt to join together TEI subscribers involved in processing natural language to exchange information on their use of TEI for particular LT needs. This can mean either utilising existing TEI mechanisms (e.g. feature structures) to encode new types of resources, proposing TEI extensions, or establishing linkage mechanisms to other standards (e.g. OWL). This work could lead to best practice guides, tutorials at HLT oriented events, and also feed into the shaping of TEI P5.

Training TEI trainers

Edward Vanhoutte and Ron Van den Branden, Centre for Scholarly Editing and Document Studies (CTB), Royal Academy of Dutch Language and Literature (KANTL) (Belgium); Susan Schreibman, Maryland Institute for Technology in the Humanities (US)

Where do you start learning how to use the TEI? The TEI website provides some useful hints and links to documentation which can help the enterprising individuals overcome the steep learning curve associated with mastering XML, XSL, and associated technologies. Or the beginner can be mentored by an advanced user of TEI over email or in his or her own work environment. But more and more, people choose to participate in a TEI training session which can be part of an internal training initiative, a private organisation, an open workshop or a university curriculum. But where do you learn how to train people to use the TEI?

This Special Interest Group on training seeks to gather beginning, experienced as well as prospective trainers of TEI who are willing to share their experiences and ideas and learn from them. The SIG will report on training sessions organized in the USA, the UK, Ireland, Belgium, the Netherlands, and South Africa. The following topics will be addressed:
  • different audiences, different curricula
  • building a TEI training programme
  • a minimum curriculum: TEILite
  • adding a touch of style; CSS and XSL for TEI
  • documentation, software and tools
  • the TEI Certification programme
  • the role of the TEI Consortium

Technical Topics

[Mainly for logistical reasons, we have to limit the number of SIGs meeting in Nancy to six. For that reason, I've combined together a number of originally distinct proposals. Of course, the first item on the agenda of this and the next SIG should be to decide whether or not this grouping makes sense. LB]

Graphics and Text

John Walsh (Indiana); Peter Boot (Utrecht)

I'd like to propose a SIG on the topic of Graphics and Text. In looking for interested parties, I've described the SIG as being "for those working with graphics-intensive texts, studying graphics-related encoding issues, and researching best practices and innovative methods for representing and manipulating graphical elements in TEI-encoded texts." Peter Boot of the Utrecht Emblem project supports the idea, and he and I will both be attending the Nancy meeting. The topic has obvious relevance to my TEI-based Comic Book Markup Language (CBML) project, but also to our recently-funded project to digitize the alchemical manuscripts of Isaac Newton. The Utrecht Emblem project and the Wittgenstein Archive are other projects with similar concerns. I plan to work with Peter and others to refine the language about the SIG's interests, but I wanted to get the proposal off to you. Let me know if this sounds appropriate, and I'll follow up with an update when we have one. Looking forward to seeing you both in Nancy.

Overlapping Markup

Patrick Durusau, Society for Biblical Literature. Patrick.Durusau@sbl-site.org; Peter Robinson, Centre for Technology and the Arts, De Montfort University. peter.robinson@dmu.ac.uk; Kevin S. Kiernan, Department of English, University of Kentucky. kiernan@uky.edu; Dorothy Carr Porter, Research in Computing for Humanities, University of Kentucky. dporter@rch.uky.edu.

Overlapping markup is a significant problem in text encoding - in Biblical texts, verses consistently run from one chapter into another; in printed books, paragraphs consistently begin on one page and end on the next page; in medieval manuscripts, words are frequently split between lines. A text that contains markup for several different types of information (organization, grammar, literary influences, manuscript condition), will also result in a file that is guaranteed to contain conflicts. TEI has addressed conflicts formed from overlapping markup through the use of milestone elements and segmented elements. Unfortunately, neither of these solutions are sufficient on their own, and use of them may not result in well-formed and valid XML. We propose to form an SIG to examine the issues contributing to and present systematic strategies for dealing with overlapping markup.

Multilingual Markup (TEI-MM-SIG)

Alejandro G. Bia-Platas (e-mail: alex.bia@ua.es): Head of Research and Development Miguel de Cervantes Digital Library, University of Alicante (Ed. Institutos); Christian Wittern (Univ Kyoto)

Multilingual Markup

Markup is based on mnemonics (i.e. element names, attribute names and attribute values). These mnemonics have meaning, being this one of the most interesting features of markup. Markup allows us to define the structure of a text in a way that can be both processed by computer programs and understood by humans. Human understanding of this meaning is lost when the encoder doesn't have a good command of the language the mnemonics are based on. For example, a Spanish encoder who doesn't know English will find it difficult and error prone to apply or understand TEI markup using the original TEI mnemonics based on the English language.

So, by multilingual markup we mean applying marks using mnemonics in one's own language. As we have demonstrated by experience at the Miguel de Cervantes Digital Library, a markup vocabulary exactly equivalent to TEI can be developed in Spanish, Catalan, French and almost any other language, and the tools for translation back and forth to the original TEI core can be built automatically and can be applied in a transparent and easy way. When we build markup vocabularies equivalent to TEI but in the local language, the structural facilities and constraints of the markup scheme remain the same, only the markup terms used by DTDs/Schemas/documents are different, and the document structure becomes remarkably clearer for the encoder.

For further information on these experiences using TEI markup in Spanish and Catalan you can read the following article: http://cervantesvirtual.com/research/articles/multilingual-markup.pdf

Are the advantages of using a general and widespread markup vocabulary like TEI lost?

Not at all. The two main advantages of using a general markup vocabulary like TEI are document interchangeability and community support (which includes training and tool sharing). Since markup terms can be very easily and automatically translated to the original TEI tagset, interchangeability is not lost and tools like XSLT scripts can still be used unchanged after translation. Training materials may need to be translated or adapted, but this is not a result of the use of multilingual markup but the usual handicap of non-English-speaking encoders.

Purpose of the group

The purpose of this interest group is to exploit and expand the benefits of using markup tags in one's own language, like the consequent reduction of learning times, increase in production, reduction of markup errors, and interesting possibilities for advanced XML based search engines. Scholars and students will also be pleased for being able to handle documents with markup in the same language of the text.

During the first TEI-MM-SIG meeting at nancy, the idea, tools and possibilities of multilingual markup will be introduced, accompanied by a practical demonstration. During the session the objectives of the group will be established. Amongst the possible objectives are:
  1. Translate all TEI mnemonics into different languages. This should be done by TEI users of different language zones, with interests in using markup in their own language. This is one of the main reasons to become a member of this SIG: to help translate TEI tags to your own language.
  2. Build a TEI Term-Bank for multiple languages: using the different sets of mnemonics above mentioned, we should build an official repository of TEI terms which will feed the automatic generators of translating scripts which will in turn translate the markup terms used in documents and DTDs/Schemas from any one language to any another. The technical support for implementing this Term-Bank will be supplied by the TEI META-Group as agreed in past meeting of the TEI-Council.
  3. Betatest the new term-sets and tools. This is another reason to join this group: to be the first to use this technology and provide feedback to improve it.
  4. Study the technical possibilities, limitations and challenges of multilingual markup. There are many aspects to be discussed and decisions yet to be made. To give an example: the existent limitations for choosing tag-names may affect the way in which we build mnemonics in accented languages.

Tools for Authoring, Presentatation and Publication

Matt Zimmerman - NYU; Barbara Bordalejo - Canterbury Tales Project; Sebastian Rahtz (OUCS); J-L. Benoit (ATILF)

[Editorial note: I have combined into one SIG the following, although they were originally three distinct proposals. Of course, the first item for the SIG will be to decide whether this was a good idea...]

Presentation Issues

Though the TEI's focus is the encoding of text for interchange, and therefore specifically NOT concerned with what the encoded text will be used for, I think it is a reality that many TEI members intend to present their texts to a reader in some electronic format such as the web or other media. Since there are so many TEI members encoding texts for future presentation, it would be helpful to have a SIG the focusess on presentation tools. My experience has been that the TEI is an excellent resource for encoding questions, but when the encoding is done and a project is ready for some sort of electronic presenation, many TEI member's do not know where to begin. Other TEI members would be a great resource for getting started, even though presentation tools are NOT part of the TEI mission.

Sample topics: XSLT, Cocoon, eXist, etc.

Authoring Issues

Is anyone interested in a SIG session at the TEI members meeting about using the TEI as an authoring scheme? The idea would be to discuss any problems in writing conventional office-type documents (reports, web pages, articles etc) using the TEI markup. This might be solely about which tags to use, and for what, but could also cover
  • what new elements or attributes are needed
  • good practice in incorporating other XML languages (MathML, SVG, Docbook etc)
  • suitable authoring tools.

The aim would be to generate a public set of guidelines.

User Interface issues

Jean-Luc Benoit

J'aimerais réfléchir avec d'autres personnes (je rédige un papier là-dessus) sur la manière de rendre accessible aux utilisateurs la richesse de l'encodage TEI.

Le principe même du codage TEI permet d'appréhender l'oeuvre dans son ensemble, permet d'en faciliter la lecture grâce à une bibliographie détaillée, grâce à la mise en lumière de sa structure, grâce à la mise en lumière de son apparat critique.... Comment rendre ces richesses accessibles ? Je sais que vous avez évoqué un moteur de recherche XML capable d'accéder à l'encodage TEI. Pourrons-nous l'utiliser rapidement ? et comment ? Stella est une base de données lexicales, pas une base de connaissances littéraires. Il y a donc des choses à faire !!! pour proposer un nouvelle base, un nouvel outil....

Une autre question me tient à coeur : c'est la notion d'échanges. J'aimerais savoir si des collaborations pourraient être mises sur pied entre des organismes travaillant sur le même domaine que le nôtre, utilisant le même système d'encodage. Je rappelle que depuis un an, à l'ATILF tous les textes nouveaux de la base sont en XML et utilisent une DTD TEI.

Pourquoi pas, en respectant les intérêts de chacun, construire des super-bases mettant en commun leurs ressources ?

[translation by LB]

I would like to discuss with others (since I am writing a paper on the subject) how to make the richness of a TEI encoding accessible to users. The very principle of a TEI encoding is to catch the whole of a work, thus facilitating readings based on a detailed bibliographic description, the presentation of its structure, or a critical appartus. How can one make these riches accessible? I know you've mentioned an XML-aware search engine capable of using TEI encoding. Can we use it now, and how? Stella is a lexical database, not a literary knowledge base. So we have things to do! to propose a new store, a new tool.

Another question that is dear to my heart: the notion of exchange. I would like to know whether collaboration could be undertaken between organizations working on the same domain as ourselves, using the same coding system. For a year now, all new ATILF texts have been captured in XML, using a TEI DTD. Why not make super-collections, bringing resources together, while respecting the rights of each individual?

Digital Libraries

Perry Willett (Indiana University) and Alejandro Bia (Alicante)

Digital libraries use the TEI to publish online collections containing large numbers of texts. Generally speaking, the texts in these collections are not encoded to a deep level. Instead, the markup in large part consists of divs representing the hierarchical structure, and indications of font shifts. The text might also be linked to page images. Some issues of interest to people in digital libraries working with TEI texts:
  • consistent encoding across collections, facilitating cross-collection searching and development of standard libraries of XSL stylesheets;
  • uniform practice in TEI headers;
  • exploration of other metadata schemes, including MARC, EAD, METS, MODS, and OAI/Dublin Core, in conjunction with TEI documents.
  • Lightweight, open-source search-and-display systems for digital libraries for collections of TEI documents.

Last recorded change to this page: 2007-09-12  •  For corrections or updates, contact web@tei-c.org