Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije
Common Language Resources and Technology Infrastructure, Slovenia

Online concordancers

Concordancers are computer programs that enable searching and statistical treatment of data in big text collections (corpora). They have a user interface which makes them easily accessible also for those that are less tech-savvy.

CLARIN.SI Concordancers

CLARIN.SI provides two concordancers that enable searching through numerous corpora. Both concordancers can be used to search through tagged corpora, as well as to display and sort the concordances, create frequency lexicons, calculate collocations, etc.

CLARIN.SI KonText

KonText CLARIN.SI offers exploration and analysis of over 40 corpora of Slovene and other languages. Access to the corpora is open, although registration via AAI is needed to use the more advanced functions of KonText. Registration also enables setting view options for individual corpora, saving of personal subcorpora, a history of queries, etc.

KonText was developed for the purposes of the Czech National Corpus, while CLARIN.SI uses the fork developed in the scope of the Czech CLARIN infrastructure. A user manual is available here.

noSketch Engine

noSketch Engine CLARIN.SI offers the same corpora as KonText CLARIN.SI but via a different interface. Registration is not necessary and neither is it possible. This also has some drawbacks, as e.g. view options can be personalized, but not saved.

noSketch Engine is the open source version of the commercial Sketch Engine; instructions for its use are available here.

Specialised concordancers for reference corpora

Some corpora in the repository can be searched either through the concordancers mentioned above or through their dedicated concordancers.

Gigafida

Gigafida is a reference corpus of written standard Slovene which includes texts of various genres. Its first version was developed during the Communication in Slovene project from 2007 to 2013, while its upgraded version (v2.0) was published in 2019.

Kres

Kres is a balanced subcorpus of the first version of the Gigafida corpus which was created during the Communication in Slovene project.

Gos

Gos is a corpus of spoken Slovene which was created during the Communication in Slovene project.

Other concordancers

There are another two corpora for Slovene which are not archived with CLARIN, but can nonetheless be accessed on the links below:

Nova beseda

Nova beseda is a corpus that contains 380 million words and was created by the Institute of Slovenian Language ZRC SAZU.

Evrokorpus

Evrokorpus is a collection of parallel bilingual corpora of Slovene translations of EU legislation. The collection is linked to Evroterm – a multilingual terminology base.

CLARIN.SI CLARIN CENTRE B K-CENTRE Data Seal of Approval OpenAIRE re3data_logo Open DOAR Open Archives

CLARIN.SI IS SUPPORTED SUPPORTED BY THE MINISTRY OF EDUCATION, SCIENCE AND SPORT UNDER THE PROGRAMME OF "EUROPEAN RESEARCH INFRASTRUCTURES".
Jožef Stefan Institute, 2014-2020. Your use of the CLARIN.SI website is subject to the CC BY License and our terms of use.