Concordancers are computer programs that enable the searching and statistical treatment of data in big text collections (corpora). They have a user interface which makes them easily accessible also for those that are less tech-savvy.
CLARIN.SI maintains three concordancers that enable searching through numerous corpora. All three offer the same set of corpora, support searching tagged corpora, displaying and sorting concordances, creating frequency lexicons, calculating collocations, saving results of queries etc. They use the same back-end program but differ in their user interfaces.
The CLARIN.SI KonText concordancer was developed for the purposes of the Czech National Corpus and is openly available on the GitHub platform. A user manual is available here. All corpora on KonText are openly available, although registration via AAI is needed to use the more advanced functions of KonText. Registration enables the setting of view options for individual corpora, saving of personal subcorpora, a history of queries, etc. As opposed to noSketch Engine, KonText offers immediate access to speech recording accompanying spoken corpora, however, it does not support the computation of keywords.
CLARIN.SI would like to thank the personnel of the Czech National Corpus, in particular Tomáš Machálek, for their help with installing KonText at CLARIN.SI.
New noSketch Engine (Crystal)
The CLARIN.SI Crystal noSketch Engine concordancer is an open-source version of the commercial Sketch Engine which was developed by Lexical Computing. Instructions for its use are available here. To use the concordancer, registration is not necessary and neither is it possible. This also has some drawbacks, e.g. view options will be set the way the last person using the corpus has configured them.
CLARIN.SI would like to thank the personnel of Lexical Computing, in particular, Jan Bušta and Tomáš Svoboda for their help with installing noSketch Engine at CLARIN.SI.
Old noSketch Engine (Bonito)
The CLARIN.SI Bonito noSketch Engine is the old version of noSketch Engine with a radically different user interface from Crystal, and is no longer maintained by Lexical Computing and also has no user documentation. CLARIN.SI will continue to maintain Bonito (so that it will offer the same corpora as the other two concordancers), as many Slovenian users are used to working with this concordancer, various language resources refer to this concordancer and it also offers some functions that the new noSketch Engine does not, in particular, accessing the results of queries in XML, where it is enough to add the parameter “format=XML” to the end of the query URL.
CLARIN.SI would like to thank the directors of Lexical Computing, Miloš Jakubíček and Pavel Rychlý, for making their concordancer and esp. the manatee back-end openly available.
Specialised concordancers for reference corpora
Some Slovenian reference corpora can, in addition to searching them via the CLARIN.SI concordancers, be also searched through their dedicated concordancers, available at the Center for Language Resources and Technologies at the University of Ljubljana:
Gigafida is a reference corpus of written standard Slovene which includes texts of various genres. Its first version was developed during the Communication in Slovene project from 2007 to 2013, while its upgraded version (v2.0) was published in 2019.
Kres is a balanced subcorpus of the first version of the Gigafida corpus which was created during the Communication in Slovene project.
Gos is a corpus of spoken Slovene which was created during the Communication in Slovene project.
There are other corpora for Slovenian which can be searched using their specialised concordancers:
Evrokorpus is a collection of parallel bilingual corpora of Slovene translations of EU legislation. The collection is linked to Evroterm – a multilingual terminology base.
The corpus of tourist-related texts TURK is a multilingual (Slovenian, Italian, English) corpus that was compiled in the scope of the Scientific research centre of the University of Primorska.
Nova beseda is a Slovenian corpus that was created by the Institute of Slovenian Language ZRC SAZU.