One of the primary purposes of the CLARIN infrastructure is to provide reliable archiving and access to language resources such as corpora, lexicons, audio and video recordings, grammars, language models, etc.
CLARIN.SI maintains a certified repository with, currently, over 500 language resources and tools or approximately 3.7 TB of data for 90 languages. The majority of entries focuses on Slovenian and other South Slavic languages. The repository includes a broad set of large corpora (i. e., structured collections of texts) for studying these languages, as well as a number of parallel and manually tagged corpora, lexicons and language models to be used in language tools.
The repository is regularly maintained and Core Trust Seal certified. It enables storing and download of language resources in accordance with clearly defined technical and legal standards. It supports easy user authentication and authorisation, as well as allocation of persistent identifiers to uploaded resources. The repository follows the FAIR principles and the conditions of the applicable licence for the archived resources and tools. It ensures long-term archiving since all resources with their persistent identifiers could be easily transferred to repositories of other CLARIN centres provided that CLARIN.SI would stop operating.
The CLARIN.SI repository is registered in several catalogues of research data repositories, such as OpenAIRE and re3data. Furthermore, CLARIN developed the Virtual Language Observatory (VLO) which is a faceted browser that enables searching within all CLARIN centres.
For more information, follow the links below: