In addition to the repository and online concordancers, CLARIN.SI provides its users also with the following services.
Automated text annotation
CLARIN.SI offer an on-line service for automatic linguistic annotation of South Slavic languages: the CLASSLA Annotation Tool. It uses the CLASSLA-Stanza pipeline for text annotation, and has models for Bulgarian, Croatian, Macedonian, Serbian and Slovenian texts, also with models for non-standard (colloquial) Croatian, Serbian and Slovenian. It can annotate l
This service replaces the ReLDIanno text annotation service, an earlier tool that supported the processing of Slovenian, Croatian, and Serbian. Although the ReLDIanno service is no longer accessible through the web application, it remains available via the Python library. More information is available here.
Manual text annotation
CLARIN.SI hosts a tool for on-line manual linguistic annotation of corpora called WebAnno. To read more about WebAnno, have a look at the home page of the project. If you would like an account on WebAnno@CLARIN.SI, please send an e-mail to info@clarin.si explaining who you are and why you need access.
Storage and cooperative development
CLARIN.SI has a virtual organisation at GitHub called CLARINSI that hosts a number of projects related to language resources and technologies, such as PoS and NER taggers, word normalisers, standards and conversions between linguistic formalisms.
CLARIN.SI also hosts a GitLab server that offers a platform for developers of language technology tools and resources. The main advantage as compared to GitHub.com is that projects can also be made private, without paying any fee, and that not all of our code is stored by companies in the U.S. If you would like an account on GitLab@CLARIN.SI, please send an e-mail to info@clarin.si explaining who you are and why you need access.
LLM evaluation dashboard
In cooperation with the CLASSLA knowledge centre, CLARIN.SI hosts the CLASSLA LLM Evaluation Dashboard for South Slavic Languages. This interactive dashboard presents the performance of large language models (LLMs) and other technologies on a range of text classification and commonsense reasoning benchmarks for South Slavic languages and dialects. For further details on the evaluated models and benchmarks, see the paper presenting the evaluation and the implementation code available in the GitHub repository.
Text simplification and analysis
CLARIN.SI, through its partner CJVT UL, hosts the service SENTA for simplification and analysis of Slovene texts. When developing the tool, special attention was paid to accessibility for the users with special needs. If you want to try the SENTA tool and find out more about it, visit the service’s website.
Summarizing corpus data
CLARIN.SI, through its partner CJVT UL, hosts a service for summarizing corpus data called Korpusnik, which displays statistical and textual data from the five corpora of the Slovenian language in a user-friendly way. When developing the tool, special attention was paid to accessibility for the users with special needs. You can read more about Korpusnik on the tool’s website.
Knowledge transfer
CLARIN.SI supports the recording and archiving of the JOTA lectures, organised by the Slovene Society for Language Technologies, on the VideoLectures portal. Note that the recordings of some other CLARIN events are also available at VideoLectures.
