What's New
corpus
Description:
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 10,590 texts (almost 1.4 million words) written by adult speakers for whom Slovene is not their first language. This corpus ...
Ta vnos vsebuje 2 datotek(e) (167.69
MB).
Publicly Available
corpus
Description:
The ParlaMint-ES-CN corpus is the contribution of the Parliament of the Canary Islands (Parlamento de Canarias) to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint). It contains ...
Ta vnos vsebuje 2 datotek(e) (2.1
GB).
Publicly Available
lexicalConceptualResource
Description:
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1 (http://hdl.handle.net/11356/1050) into a preliminary ...
Ta vnos vsebuje 1 datoteko (1.43
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
Ta vnos vsebuje 7 datotek(e) (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
The BERTić-data text collection contains more than 8 billion tokens of mostly web-crawled text written in Bosnian, Croatian, Montenegrin or Serbian. The collection was used to train the BERTić transformer model ...
Ta vnos vsebuje 10 datotek(e) (21.14
GB).
Publicly Available