MULTEXT-East
- URL: http://nl.ijs.si/ME/
Description:
The resources are a multilingual dataset for language engineering
research and development. This dataset contains, for Bulgarian, Croatian, Czech,
English, Estonian, Hungarian, Lithuanian, Resian, Romanian, Russian, Slovene, and
Serbian, some, or all of the following language resources:
- the morphosyntactic specifications, lexica, and annotated
"1984" corpus;
- parallel and comparable text and speech corpora;
- and associated documentation.
The complete corpora as well as the documentation are encoded in TEI P4.
The project was a spin-off of MULTEXT
and ran from '95 to '97. developed language resources for six
languages: Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene, as well as
for English, as the hub language of the project. It also
adapted existing tools and standards to these languages. The main results of the
project were an annotated multilingual corpus and lexical resources for the seven languages.
The extended results of the project were made available in 1998, first on CD-ROM and
then via TRACTOR, the TELRI Research Archive of Computational Tools and Resources.
In the scope of the Concede project, a new release was made available in 2002; it
contained only the (updated and corrected) morphosytntactic resources from the first
release. This second release was made freely available for research use via the Web.
Finally, the third release was made in 2004 - it updates and brings together the
first two, adds new languages, and make the move from SGML to XML, in particular to
TEI P4 - this work was supported by the TEI task force on SGML to XML migration.
Version 3 is also available via the Web, from the home page of the project.
For further information on the project, its results and their
exploitation you can consult the annotated bibliography of , available
in HTML and various other formats from the project Web page.
(from the WWW page)
Contacts:
Tomaž Erjavec
Jožef Stefan Institute
Jamova 39
SI-1000 Ljubljana
Slovenia
Tel: +386 1 477-3507
Fax: +386 1 425-1038
Email: tomaz.erjavec@ijs.si