Common Language Resources and Technology Infrastructure, Slovenia

Tools, Services, Resources

Text Processing

  • spelling and grammar checker for Slovene Besana
  • Web interface for morphosyntactic tagging and lemmatisation ToTaLe
  • Web interface for morphosyntactic tagging and lemmatisation Obeliks
  • Term identification and linking: Terminator
  • dependency syntax parser SSJ
  • determining morphological tags and lemmas ZRC

Language corpora

  • Reference corpus Gigafida, 1 billion words
  • Balanced corpus Kres, 100 million words
  • Corpus of written language Nova beseda, 318 million words
  • Corpus of historical Slovene IMP, 15 million words
  • Speech corpus Gos, 1 million words
  • Downloadable manually annotated corpus ssj500k, 500.000 words
  • Downloadable (semi)manually annotated corpus jos1M, 1 million words
  • Downloadable corpus ccGigafida100 million words
  • Downloadable corpus ccKres10 million words
  • Downloadable manually annotated corpus of historical Slovene goo300k, 300.000 words
  • Downloadable treebank SDT, 30.000 words
  • Šolar: corpus of written school articles, 1 million words
  • Korpus DSI: informatics and computer science text corpus, 14 million words
  • KORP: Public relations text corpus, 1,8 million words
  • SPOOK: multilingual parallel corpus
  • Evrokorpus: multilingual parallel corpus EU legislation, collectively including more than 240 million words
  • MULTEXT-East: morphosyntactic resources for many languages: corpus, lexions, specificatoins
  • SSJ: development of reference corpus and lexical database of Slovene with syntax parser and materials for teaching Slovene 
  • JANES: language resources and linguistic analysis of non-standard Slovene
  • IMP: language resources for historical Slovene
  • JOS: linguistic annotation of Slovene
  • SIGNOR: Slovene sign language corpus and pilot grammar
  • META-SHARE: metadata collection of language resources, including Slovene
  • ELRA/ELDA: Evaluation and language resource distribution agency, including Slovene
  • LDC: Linguistic data consortium

Dictionaries and Lexical Resources

  • Pregibnik: Slavic words curving tool
  • Sloleks: Slovene word forms lexicon
  • SFT: Slovene for travelers
  • Presis: English-Slovenian and Slovenian-German machine translation
  • Google translate: statistical machine translation for several languages
  • bing Translator: statistical machine translation for several languages
  • iTranslate: statistical machine translators portal for several languages