The Oslo Multilingual Corpus

  • Host: University of Oslo
  • URL:

Description:

"We are currently developing the Oslo Multilingual Corpus (OMC), which is an extension of the English-Norwegian Parallel Corpus (ENPC).

The ENPC consists of text excerpts of approximately 10,000 to 15,000 words from fictional and non-fictional Norwegian and English original texts and their translations, amounting to a total of 200 texts, or 2.6 million words. German, Dutch and Portuguese translations were added for some of the texts. The texts are SGML-encoded and aligned at sentence level.

The corpus is now being extended on the German side in particular, to ensure equal representation of texts in English, German, and Norwegian, to the extent that this is possible. Recently, the project has been extended to French. Eventually, the corpus will contain original texts in four languages (English, German, French, Norwegian) and their translations into as many as possible of the other three languages. Currently (October 2001), the English-German-Norwegian part of the corpus consists of 32 English, 31 German, and 22 Norwegian original texts with translations into the other two languages, whereas the French-Norwegian part comprises excerpts from 10 Norwegian and 10 French non-fictional texts with their respective translations.

Due to copyright restrictions, the corpus is only available to researchers and graduate students at the universities in Oslo and Bergen."

– Oslo Multilingual Corpus WWW page

Contact:

Stig JohanssonDepartment of British and American StudiesUniversity of OlsoEmail: Stig.Johansson@iba.uio.no

Bergljot BehrensDepartment of LinguisticsUniversity of OsloEmail: bergljot.behrens@ilf.uio.no