JOS corpora of Slovene

General description: The JOS project developed Slovene annotated corpora and associated resources meant to facilitate development of Human Language Technologies for the Slovene language. The main results are the JOS morphosyntactic specifications (tagset definition), two annotated corpora, and two Web services. The developed resources are available under the Creative Commons licences.

Implementation description: The corpora and morphosyntactic specifications are encoded in TEI P5 using the additional modules for

corpora, linking, analysis and iso-fs plus a few local extensions.

Related resources: Links to papers describing the corpora are given at http://nl.ijs.si/jos/index-en.html#bib

Copyright information: The corpora are distributed under the Creative Commons, Attribution, Non-commercial licence.

Contact:

Tomaž Erjavec

Department of Knowledge Technologies

Jožef Stefan Institute

Jamova cesta 39

1000 Ljubljana

Slovenia

Email: tomaz.erjavec@ijs.si