Polish language of the XX century sixties

Host: Institute of Informatics. University of Warsaw
Other institutions involved: Present-day form of the corpus is the result of collaborative effort of several persons with different affiliations, both volunteers and supported financially by various grants. Details are referenced in the editorial declaration.
URL: http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm

Description: The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures.

Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.

Implementation description: TEI P4

Other Related Resources: Corpus documentation: http://www.mimuw.edu.pl/polszczyzna/pl196x/doc/index_en.htm

Access: GNU General Public Licence for corpus data, GNU Free Documentation Licence for corpus documentation.

Contact:

attn.: Janusz S. Bień

Katedra Lingwistyki Formalnej UW

Browarna 8/10

00-311 Warszawa

Tel: (48) 22 5520918

Fax: (48) 22 5520918

Email: jsbien@uw.edu.pl