One of the primary purposes of the CLARIN infrastructure is to provide reliable archiving and access to language resources such as corpora, lexicons, audio and video recordings, grammars, language models, etc.
CLARIN.SI maintains a certified repository with, currently, over 200 language resources and tools or approximately 200 GB data for 80 languages among which the majority is dedicated to Slovene, Croatian and Serbian. The repository includes a broad set of large corpora (i. e., structured sets of texts) for studying these languages, as well as a number of parallel and manually tagged corpora, lexicons and language models to be used in language tools.
The repository is regularly maintained and Core Trust Seal certified. It enables storing and download of language resources in accordance with clearly defined technical and legal standards. It supports easy user authentication and authorisation, as well as allocation of persistent identifiers to uploaded resources. The repository follows the FAIR principles and the conditions of the applicable licence for the archived resources and tools. It ensures long-term archiving since all resources with their persistent identifiers could be easily transferred to repositories of other CLARIN centres provided that CLARIN.SI would stop operating.
The CLARIN.SI repository is registered in several catalogues of research data repositories, such as OpenAIRE and re3data. Furthermore, CLARIN developed the Virtual Language Observatory (VLO) which is a faceted browser that enables searching within all CLARIN centres.
For more information, follow the links below:
- more about the repository,
- how to deposit data or tools,
- citing data policy,
- submission lifecycle.