Multilingual Text Tools and Corpora (MULTEXT)

Host: Laboratoire Langage et Parole, Centre National de la Recherche Scientifique (CNRS) [French National Center for Scientific Research]
URL: http://www.lpl.univ-aix.fr/projects/multext

Description:

Multext encompasses a series of projects whose goals are to develop standards and specifications for the encoding and processing of linguistic corpora, and to develop tools, corpora, and linguistic resources embodying these standards. Multext is developing tools, corpora, and linguistic resources for a wide variety of languages, including Bambara, Bulgarian, Catalan, Czech, Dutch, English, Estonian, French, German, Hungarian, Italian, Kikongo, Occitan, Romanian, Slovenian, Spanish, Swedish, and Swahili. All Multext results are made freely and publicly available for non-commercial, non-military purposes.

Corpus Encoding Standard:

MULTEXT, along with EAGLES and the Vassar/CNRS collaboration (supported by the U.S. National Science Foundation), have developed a Corpus Encoding Standard that will "serve as a widely accepted set of encoding standards for corpus-based work".

Funding:

The Multext effort has been supported by the European Commission, under the Linguistic Research and Engineering, Copernicus, and Langues regionales et minoritaires programmes; the U.S. National Science Fundation, under the Vassar/CNRS collaboration; the Fonds Francophone pour la Recherche (AUPELF-UREF); the Centre National de la Recherche Scientifique (CNRS) and the Universite de Provence.

Contact:

Dr. Jean Veronis (coordinator)

Laboratoire Parole et Langage

CNRS & Universite de Provence

29, Av. Robert Schuman

13621 Aix-en-Provence Cedex 1, France

Tel: (+33) 42 95 36 33

Fax: (+33) 42 59 50 96

E-mail: Jean.Veronis@lpl.univ-aix.fr