Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije
Common Language Resources and Technology Infrastructure, Slovenia

Online concordancers

Concordancers are computer programs that enable the searching and statistical treatment of data in big text collections (corpora).

CLARIN.SI Concordancers

CLARIN.SI maintains several concordancers that enable searching over 100 corpora in 30 langauges, and with about 20 billion words. They support searching tagged corpora, displaying and manipulating concordances, creating frequency lists, calculating collocations, saving results of queries etc. They use the same back-end program but differ in their user interfaces.

noSketch Engine

noSketch Engine is an open-source version of the commercial Sketch Engine which was developed by Lexical Computing. Instructions for its use are available here.

CLARIN.SI offers two installations of noSketch Engine:

  • https://www.clarin.si/ske: log-in is not required (or possible), which simplifies use for less advanced users
  • https://www.clarin.si/skelog: log-in is required, although everybody can register themselves; logging-in allows subcorpus creation and personalised display of e.g. corpus attributes.

CLARIN.SI would like to thank the personnel of Lexical Computing, in particular, Jan Bušta and Tomáš Svoboda for their help with installing noSketch Engine at CLARIN.SI.

KonText

The KonText concordancer was developed for the purposes of the Czech National Corpus and is openly available on the GitHub platform. A user manual is available here. As opposed to noSketch Engine, KonText offers immediate access to speech recording accompanying spoken corpora, however, it does not support the computation of keywords.

CLARIN.SI offers the following installation of KonText:

  • https://www.clarin.si/kontext: log-in is not required, although log-in (via AAI e.g. EduGain) is needed to use the more advanced functions of KonText. It also enables the setting of view options for individual corpora, saving of personal subcorpora, a history of queries, etc.

CLARIN.SI would like to thank the personnel of the Czech National Corpus, in particular Tomáš Machálek, for their help with installing KonText at CLARIN.SI.

Old noSketch Engine

The old noSketch Engine (“Bonito”) has a substantially different user interface from the current version, and is no longer maintained by Lexical Computing and also has no user documentation. For the time being the CLARIN.SI  installation will continue to be available, as various language resources refer to it, and it also offers some functions that the new noSketch Engine does not, in particular, accessing the results of queries in XML.

The entry point for the old noSketchEngine is:

CLARIN.SI would like to thank the directors of Lexical Computing, Miloš Jakubíček and Pavel Rychlý, for making their concordancer and esp. the manatee back-end openly available.

Other concordancers and corpora

Some Slovenian reference corpora can, in addition to searching them via the CLARIN.SI concordancers, be also searched through their dedicated concordancers, available at the Center for Language Resources and Technologies at the University of Ljubljana:

  • Gigafida is a reference corpus of written standard Slovene which includes texts of various genres. Its first version was developed during the Communication in Slovene project from 2007 to 2013, while its upgraded version (v2.0) was published in 2019.
  • Kres is a balanced subcorpus of the first version of the Gigafida corpus which was created during the Communication in Slovene project.
  • Gos is a corpus of spoken Slovene which was created during the Communication in Slovene project.

There are other corpora for Slovenian which can be searched using their specialised concordancers: