Services - CLARIN Slovenia

	Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije Common Language Resources and Technology Infrastructure, Slovenia

In addition to the repository and online concordancers, CLARIN.SI provides its users also with the following services.

Contents hide

1 Automated text annotation

2 Manual text annotation

3 Storage and cooperative development

4 LLM evaluation dashboard

5 Text simplification and analysis

6 Summarizing corpus data

7 Knowledge transfer

Automated text annotation

CLARIN.SI offer an on-line service for automatic linguistic annotation of South Slavic languages: the CLASSLA Annotation Tool. It uses the CLASSLA-Stanza pipeline for text annotation, and has models for Bulgarian, Croatian, Macedonian, Serbian and Slovenian texts, also with models for non-standard (colloquial) Croatian, Serbian and Slovenian. It can annotate lemmas, morphology, dependency syntax, named entities, and – depending on the language – also semantic roles. For more details, see the description of the annotation service here.

This service replaces the ReLDIanno text annotation service, an earlier tool that supported the processing of Slovenian, Croatian, and Serbian. Although the ReLDIanno service is no longer accessible through the web application, it remains available via the Python library. More information is available here.

Manual text annotation

CLARIN.SI hosts a tool for on-line manual linguistic annotation of corpora called WebAnno. To read more about WebAnno, have a look at the home page of the project. If you would like an account on WebAnno@CLARIN.SI, please send an e-mail to info@clarin.si explaining who you are and why you need access.

Storage and cooperative development

CLARIN.SI has a virtual organisation at GitHub called CLARINSI that hosts a number of projects related to language resources and technologies, such as PoS and NER taggers, word normalisers, standards and conversions between linguistic formalisms.

CLARIN.SI also hosts a GitLab server that offers a platform for developers of language technology tools and resources. The main advantage as compared to GitHub.com is that projects can also be made private, without paying any fee, and that not all of our code is stored by companies in the U.S. If you would like an account on GitLab@CLARIN.SI, please send an e-mail to info@clarin.si explaining who you are and why you need access.

LLM evaluation dashboard

In cooperation with the CLASSLA knowledge centre, CLARIN.SI hosts the CLASSLA LLM Evaluation Dashboard for South Slavic Languages. This interactive dashboard presents the performance of large language models (LLMs) and other technologies on a range of text classification and commonsense reasoning benchmarks for South Slavic languages and dialects. For further details on the evaluated models and benchmarks, see the paper presenting the evaluation and the implementation code available in the GitHub repository.

Text simplification and analysis

CLARIN.SI, through its partner CJVT UL, hosts the service SENTA for simplification and analysis of Slovene texts. When developing the tool, special attention was paid to accessibility for the users with special needs. If you want to try the SENTA tool and find out more about it, visit the service’s website.

Summarizing corpus data

CLARIN.SI, through its partner CJVT UL, hosts a service for summarizing corpus data called Korpusnik, which displays statistical and textual data from the five corpora of the Slovenian language in a user-friendly way. When developing the tool, special attention was paid to accessibility for the users with special needs. You can read more about Korpusnik on the tool’s website.

Knowledge transfer

CLARIN.SI supports the recording and archiving of the JOTA lectures, organised by the Slovene Society for Language Technologies, on the VideoLectures portal. Note that the recordings of some other CLARIN events are also available at VideoLectures.