Common Language Resources and Technology Infrastructure, Slovenia


KonText | noSketch | Gigafida, Kres, Gos | Nova beseda | Evrokorpus


KonText CLARIN.SI offers over 40 corpora of Slovene and other languages. Access to the corpora is open, although registration via AAI is needed tu use the more advanced functions of KonText. Registration also enables setting view options for individual corpora, saving of personal subcorpora, a history of queries etc.

KonText was developed for the purposes of the Czech National Corpus, while CLARIN.SI uses the fork developed in the scope of the Czech CLARIN infrastructure. A user manual is available here.

CLARIN.SI noSketch Engine

noSketch Engine CLARIN.SI offers the same corpora as KonText CLARIN.SI but via a different interface. Registration is not necessary and neither is it possible. This also has some drawbacks, as e.g. view options are global, regardless of the user.

noSketch Engine is the open source version of the commercial Sketch Engine; instructions for its use are available here.

Gigafida, Kres and Gos

The concordancers for Slovene reference corpora Gigafida (1 billion words), Kres (balanced, 100 mil.) and Gos (spoken language, 1 million) were purpose built for these corpora, mainly in the project “Communication in Slovene” 

Nova beseda

Nova beseda (“New Word”) is a 300 million word corpus of the Fran Ramovš Institute of the Slovenian Language.


Evrokorpus is a collection of parallel bilingual Slovene corpora of EU legislation, connected with Evroterm.

CLARIN.SI CLARIN CENTRE B K-CENTRE Data Seal of Approval OpenAIRE re3data_logo Open DOAR Open Archives