TEI for Linguists SIG

Context

The TEI recommendations relevant for linguists are scattered across several chapters of the Guidelines, especially the chapters: Transcriptions of Speech, Dictionaries, Language Corpora, Simple Analytic Mechanisms, and Feature Structures. There used to be an effort to build a possible chapter on corpus encoding, but the effort was not conclusive. However, TEI-based annotation schemes are not as widely known among linguists as most TEI-insiders would expect. Consequently, TEI encoding is not used very often by linguists (unless those linguists happen to be digital humanists at the same time). This SIG addresses this situation and promotes the TEI Guidelines among linguists and researchers who work with language resources at large.

Scope

Within linguistics, given the paradigm change that took place due to the increasing importance of empirical methods that require ever larger amounts of language data as the testbed for new theories, the use of language resources has increased dramatically in the first decades of the 21st century. The aim of the SIG “TEI for Linguists” is to provide guidance and the means for encoding language resources in the TEI. This relates to both “item-based” resources (lexica, ontologies) and “text-based” resources (corpora). Moreover, the SIG “TEI for Linguists” attempts to be a forum for scholars who want to consider the use of TEI markup schemes for some of the diverse linguistic tasks.

Some of the main tasks of this SIG are:

the identification of issues that “Ordinary Working Linguists” would like the TEI to be able to handle for them (e.g. the encoding of corpora and lexica, but also ‘everyday encoding of linguistic structures’ for the purpose of teaching or theorizing);
the promotion of modules that could be used in linguistic subdisciplines, e.g. computational linguists could make use of the feature structures module or phoneticians could use (and extend) the TEI module for the transcription of speech;
the creation of a module (or a set of them) for linguistic description, as a separate chapter of the Guidelines or a set of ODDs to enable linguists to to use the TEI encoding more easily;
interfacing with other TEI SIGs whenever they brush against linguistic issues (Ontology SIG for lexical databases, Overlap SIG for corpus encoding, Tools SIG for language description and processing tools);
offering the possibility of collaboration with researchers working in the ISO committee TC37 SC4 or research infrastructures such as CLARIN or Text+.

Conveners

Piotr Bański, Institut für Deutsche Sprache, Mannheim
Susanne Haaf-Dumont, Universität Leipzig

Mailing list and wiki space

Visit the homepage of the mailing list for the SIG or its space at TEI Wiki.

Activities

Many of the SIG's activities have been listed in the TEI wiki.