Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije
Common Language Resources and Technology Infrastructure, Slovenia

General information

Mission and vision

CLARIN.SI – Slovene Common Language Resources and Technology Infrastructure – is the Slovene national node of the European CLARIN infrastructure. The main goal of the CLARIN infrastructure is to support research activities in the Humanities and Social Sciences. CLARIN.SI is serving researchers from the fields of Computer and Corpus Linguistics, and Digital Humanities, as well as interested individuals from other scientific and business areas that use and produce language data.

CLARIN.SI is supporting the development of research activities by building and maintaining a shared research infrastructure. This infrastructure enables researchers and other interested individuals on the European level to acquire easy and long term access to language resources and technologies and expert support. CLARIN.SI is thus promoting open science principles and cross-disciplinary cooperation, as well as raising awareness in the society about the applicability of technological solutions for language data processing. On one hand, the infrastructure offers its users resources and tools that they can use and/or upgrade, and on the other hand, the infrastructure enables users to deposit their research data and software to the repository for long term storage. CLARIN.SI is providing technical and expert support which is one of the key factors to ensure infrastructure vitality. In principle, the CLARIN.SI infrastructure is not restricted to a particular language, but the majority of the infrastructure covers resources and tools for Slovene, Croatian and Serbian.

CLARIN.SI is cooperating with other research infrastructures present in Slovenia, such as DARIAH-SI (the Slovene node of the pan-European Digital Research Infrastructure for the Arts and Humanities DARIAH) and ADP (the Slovene node of Consortium of European Social Science Data Archives CESSDA). The collaboration between the infrastructures results in common projects, such as the ParlaFormat workshop that focused on the processing of parliamentary corpora, as well as long-term partnerships, such as the one with the RDA Slovenia that focuses on ensuring open access of research data in Slovenia.

CLARIN.SI services and activities

CLARIN.SI is providing different services, as well as expert and technical support to researchers.

You can read more about CLARIN.SI services and activities by following the links below:

  • REPOSITORY – a platform that enables long term storage and access to language resources and tools for natural language processing;
  • ONLINE CONCORDANCERS – platforms that enable searching and analysing language corpora, i.e. large and structured text collections;s
  • OTHER TOOLS AND SERVICES – automated text annotation service (ReLDIanno), manual text annotation service (WebAnno), storage, download and cooperative development of projects on GitHub and GitLab;
  • EXPERT SUPPORT – lists of frequently asked questions and answers on computer processing of South Slavic languages, the possibility for direct support and organisation of events.

From 2018, CLARIN.SI has been launching annual calls for project proposals with which it financially supports projects that e.g. build or upgrade language resources or services. For more information, please see CLARIN.SI projects.

The mission of CLARIN.SI also includes awareness-raising about its activities, and organisation and support of training events, often in an international setting. From its beginnings, the CLARIN.SI consortium is aware of the importance of interacting with users. For this reason, it actively supports events and initiatives that help widen the scope of user involvement in Slovenia. From 2016, CLARIN.SI has been actively involved in the organisation of the Language Technologies and Digital Humanities conference. It is also providing recordings and online access to monthly lectures on the topic of language technologies »JOTA« which are organised by the Slovenian Language Technologies Society (SDJT). In 2018, CLARIN.SI supported the XVIII EURALEX International Congress, and, in 2019, the 22nd International Conference Text, Speech and Dialogue.

If you would like to know more about CLARIN.SI and its endeavours, you can read the following blogposts that have been written as a part of the Tour de Clarin initiative supported by CLARIN ERIC:

 

History of CLARIN.SI

First steps

From 2008 to 2011, the European Commission funded the preparatory phase for the development of the European research infrastructure CLARIN. Slovenia, which was represented by Jožef Stefan Institute and Alpineon, d.o.o., first joined CLARIN as an observer country. In 2009, the Government of the Republic of Slovenia invited Slovene researchers to express their interest with regard to participation in European research infrastructures. This formed the basis for the Research infrastructure development plan 2011-2020 which was officially approved by the Government in April 2011. This document classified the CLARIN infrastructure among the priority international research infrastructures.

The CLARIN ERIC (European Research Infrastructure Consortium) was established at the beginning of 2012, after the end of the preparatory phase and as an infrastructure continuation project. Its founding members were Austria, Bulgaria, Czech Republic, Denmark, Estonia, Germany, the Netherlands and Poland. The ninth member was the Dutch Language Union, an intergovernmental body of Netherlands and Flanders.

Countries can only become members of CLARIN ERIC once they have established a national consortium and can ensure a working infrastructure and the payment of annual membership fee to CLARIN. Initial funding for the creation of the Slovene node of CLARIN was received by the Jožef Stefan Institute in October 2013. First activities of CLARIN.SI were linked to the creation of a website and establishment of the Slovene repository of language resources and technologies.

Establishment of the Slovene CLARIN centre

For the establishment of the Slovene CLARIN Consortium, the Slovenian Language Technologies Society first published a call for members, after which the CLARIN.SI Consortium Agreement was drawn up and signed by the interested parties in June 2014. These included most major institutions, associations and companies working on linguistics and language technologies in Slovenia: University of Ljubljana, University of Maribor, University of Primorska, Scientific and Research Centre of the Slovenian Academy of Sciences and Arts, Jožef Stefan Institute, Slovenian Language Technologies Society, Trojina: Institute for Applied Slovene Studies (from 2023 ceased to be a member of CLARIN.SI), Alpineon d.o.o. and Amebis d.o.o. In the autumn of 2014, CLARIN.SI consortium was joined by another two members: Institute for Contemporary History and Domestic Research Society (from 2020 ceased to be a member of CLARIN.SI). Next year, in 2015, the University of Nova Gorica also joined the consortium which then counted 12 members. The same year the Ministry of Education, Science and Sport started paying the membership fee to CLARIN ERIC, and Slovenia, represented by CLARIN.SI, officially became a member of CLARIN.

CLARIN.SI today

So far, CLARIN.SI has set up a certified repository which currently archives over 500 language resources and tools, several online concordancers which enable search and analysis of over 100 language corpora, and some other tools and services. Since its establishment, CLARIN.SI has been funding micro projects adapting existing language resources according to the requirements for archiving in the repository. From 2018, it also financially supports small-scale projects aimed at creating and upgrading language resources and tools.

In 2019, CLARIN.SI (in cooperation with the Bulgarian national CLARIN centre) founded the CLASSLA Knowledge Centre which offers support for computer processing of South Slavic languages.

CLARIN.SI is regularly organising and supporting awareness-raising activities and various training events. From 2016, CLARIN.SI is actively involved in the organisation of the Language Technologies and Digital Humanities conference and provides recordings and online access to lectures on language technologies JOTA which are organised by the Slovenian Language Technologies Society (SDJT).