Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije
Common Language Resources and Technology Infrastructure, Slovenia

Online concordancers

Concordancers are computer programs that enable searching and statistical treatment of data in big text collections (corpora). They have a user interface which makes them easily accessible also for those that are less tech-savvy.

CLARIN.SI Concordancers

CLARIN.SI provides two concordancers that enable searching through numerous corpora. Both concordancers can be used to search through tagged corpora, as well as to display and sort the concordances, create frequency lexicons, calculate collocations, etc.


KonText CLARIN.SI offers exploration and analysis of over 40 corpora of Slovene and other languages. Access to the corpora is open, although registration via AAI is needed to use the more advanced functions of KonText. Registration also enables setting view options for individual corpora, saving of personal subcorpora, a history of queries, etc.

KonText was developed for the purposes of the Czech National Corpus, while CLARIN.SI uses the fork developed in the scope of the Czech CLARIN infrastructure. A user manual is available here.

noSketch Engine

noSketch Engine CLARIN.SI offers the same corpora as KonText CLARIN.SI but via a different interface. Registration is not necessary and neither is it possible. This also has some drawbacks, as e.g. view options can be personalized, but not saved.

noSketch Engine is the open source version of the commercial Sketch Engine; instructions for its use are available here.

Specialised concordancers for reference corpora

Some corpora in the repository can be searched either through the concordancers mentioned above or through their dedicated concordancers.


Gigafida is a reference corpus of written standard Slovene which includes texts of various genres. Its first version was developed during the Communication in Slovene project from 2007 to 2013, while its upgraded version (v2.0) was published in 2019.


Kres is a balanced subcorpus of the first version of the Gigafida corpus which was created during the Communication in Slovene project.


Gos is a corpus of spoken Slovene which was created during the Communication in Slovene project.

Other concordancers

There are another two corpora for Slovene which are not archived with CLARIN, but can nonetheless be accessed on the links below:

Nova beseda

Nova beseda is a corpus that contains 380 million words and was created by the Institute of Slovenian Language ZRC SAZU.


Evrokorpus is a collection of parallel bilingual corpora of Slovene translations of EU legislation. The collection is linked to Evroterm – a multilingual terminology base.