Brief Newsletter article for Circulation to AB Members
The first meeting of the Text Encoding Initiative's Advisory Board
brought together seventeen representatives from key professional and
learned societies representing academic disciplines across the spectrum
from hard core computer science to lexicography, literary studies and
anthropology as well as the professional interests of librarians and
publishers. The purpose of the event, hosted by the University of
Illinois at Chicago, was to seek the views of the newly constituted
Advisory Board concerning the structure and proposed strategy of the
Text Encoding Initiative (TEI), to explain its relevance to the
interests of the societies and to encourage active participation in the
work of the Initiative by the societies' members.
History and Structure of TEI
The Text Encoding Initiative began in the fall of 1987, at the
instigation of the Association for Computers and the Humanities (ACH)
under the directorship of Nancy M. Ide. A planning conference in
November 1987 at Vassar College agreed that it was both necessary and
feasible to define guidelines for both the interchange of existing
encoded texts and the creation of newly encoded texts. The guidelines
would specify both what features should be encoded (at a
minimum) and how they should be encoded, as well as
suggesting ways to describe the resulting encoding scheme and its
relationship with pre-existing schemes. Compatibility with existing
schemes would be sought where possible, and in particular, ISO standard
8879, Standard Generalized Markup Language (SGML), would provide the
basic syntax for the guidelines if feasible.
After the Vassar meeting, ACH joined with the Association for Literary
and Linguistic Computing (ALLC) and the Association for Computational
Linguistics (ACL) as co-sponsors of the project and defined a four year
work plan to achieve the project's goals. Funding for the work plan has
since been provided by substantial grants from the American National
Endowment for the Humanities and the European Economic Community.
Additional funding is being sought from industry and private
foundations.
Project Structure
The work plan is coordinated by a six-member steering committee,
comprising representatives from the sponsoring organizations. An
Advisory Board of representatives of almost twenty participating
scholarly organizations ensures that a broad range of interested
researchers are able to participate in the development of the
guidelines. Two Editors, one American and one European, coordinate the
work of the project's four Working Committees, each of which is
responsible for a distinct part of the work plan.
Committee 1, the Committee for Text Documentation, with a membership
drawn largely from the library and archive management communities, is
dealing with issues concerning the cataloguing and description of key
features of encoded texts. It is drawing on work already done in this
field for bibliographic and social science data, for example in the
Anglo-American Cataloguing Rules, the American National Standard for
Bibliographic Reference, and the Standard Study Description used by a
number of social-science data archives. All the committees are expected
to work within established frameworks where these are available, as they
are here.
Committee 2, for Text Representation, is concerned with the encoding of
such features as layout and character sets. It will provide precise
recommendations covering those features of continuous discourse for
which a convention already exists in printed or written sources. This
will involve a consideration of the character sets of all alphabetic
scripts currently used in computer-based research. Explicit
consideration of non-alphabetic scripts, though not excluded, has been
deferred; transcriptions of spoken language will however be included.
The committee will also recommend ways of representing the structural
divisions of a text (book, chapter, paragraph etc.) and all other
features conventionally signalled in printed or written texts, such as
emphasis, quotation, critical apparatus etc.
Committee 3, the Committee for Text Analysis and Interpretation, has the
largest and most open-ended set of responsibilities of the four. It will
provide discipline-specific sets of tags appropriate to the analytic
procedures favored by that discipline, but in such a way as to permit
their extension and generalization to other disciplines using analogous
procedures. Because this is a very large task, committee 3 is focussing
initially on a single discipline (linguistics), chosen primarily because
of its clear relevance to all other text-based types of analysis. As
work proceeds, the focus of this committee will shift toward literary
analysis and other humanistic disciplines.
Committees 1, 2, and 3, with an average membership of ten, will set up
sub-committees to do the preliminary design work for tag sets within
specialized areas. Committee 3 already has one subcommittee, concerned
with tag sets for dictionary markup, which has already produced a set of
preliminary guidelines for monolingual dictionaries. A subcommittee of
committee 2 is also being formed, concerned with the tagging of
historical sources, to take advantage of the substantial progress
already made in this area by a network of European scholars
collaborating on the Kleio<\it> project.
Committee 4, the Syntax and Metalanguage Committee, has determined that
the syntactic framework of SGML is adequate for all foreseeable
applications within the TEI's scope, and thus will provide the basic
syntax. The guidelines will depart from SGML only if it proves
inadequate to the needs of research. The committee is currently
attempting to determine the extent to which all features of SGML can be
recommended. This committee is also surveying major existing schemes and
developing a formal metalanguage with which to describe these schemes
and the scheme developed for the Guidelines, and to provide a formally
specifiable mapping between them. Among the committee's other tasks are
validation and testing of the Guidelines as they emerge and arbitration
on matters of SGML-conformance.
The Chicago Meeting
In addition to the three sponsoring organizations, the following
associations are currently represented on the Advisory Board:
American Anthropological Association; American Historical Association;
American Philological Association; American Society for Information
Science; Association for Computing Machinery; Association for
Documentary Editing; Association for History and Computing; Association
Internationale Bible et Informatique; Canadian Linguistic Association;
Dictionary Society of North America; Electronic Publishing SIG;
International Federation of Library Associations and Institutions;
Linguistic Society of America; Modern Language Association.
After an initial presentation about the history, background, objectives
and structure of the TEI, delegates were invited to comment on their own
interest and the constituencies they served. A series of presentations
concerning the implications of the TEI for humanities research, for
computational linguistics, and for the language and information
industries followed. The goals and responsibilities of each of the
working committees were then described, as outlined above. The second
full day of the meeting began with a brief tutorial on SGML and a longer
description of the design principles, scope and end products of the
Guidelines. After a wide ranging and useful discussion, in which some
constructively critical reactions were expressed, members of the
Advisory Board expressed approval of the objectives, organizational
structure and design goals of the Initiative, as they had been presented
at the meeting.
If you would like more information about the TEI, please contact
<\doc>
Sponsoring Organizations and Representatives
Association for Computers and the Humanities, Nancy M. Ide, Vassar
College, and C. M. Sperberg-McQueen, University of Illinois at Chicago
Association for Computational Linguistics, Donald E. Walker, Bellcore
(Bell Communications Research), and Robert Amsler, Bellcore
Association for Literary and Linguistic Computing, Susan Hockey, Oxford
University Computing Service, and Antonio Zampolli, University of Pisa
Advisory Board Organizations and Representatives
American Anthropological Association, Chad K. MacDaniel, University of
Maryland
American Historical Association, Elizabeth A. R. Brown, Brooklyn College
American Philological Association, Jocelyn Penny Small, Rutgers
University
American Society for Information Science, Clifford A. Lynch, University
of California
Association for Computing Machinery / Special Interest Group for
Information Retrieval, Scott Deerwester, University of Chicago
Association for Documentary Editing, David Chesnutt, University of South
Carolina
Association for History and Computing, Manfred Thaller, Max-Planck
Institut fuer Geschichte
Association Internationale Bible et Informatique, Wilhelm Ott,
Universitaet Tuebingen
Canadian Linguistic Association, Anne-Marie di Sciullo, Universite/ du
Que/bec a\ Montre/al
Dictionary Society of North America, Thomas Cresswell
Electronic Publishers Special Interest Group, Betsy Kiser, Online
Computer Library Center (OCLC)
International Federation of Library Associations and Institutions,
J. D. Byrum Jr., The Library of Congress
Linguistic Society of America, Stephen Anderson, The Johns Hopkins
University
Modern Language Association, Randy Jones, Brigham Young University