What's New
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-12 covers the period from January 2019 to December 2024, complementing the ...
Ta vnos ne vsebuje datotek.
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
Ta vnos vsebuje 2 datotek(e) (1.33
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
Ta vnos vsebuje 3 datotek(e) (11.43
MB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
Največ ogledov
V preteklem tednu
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
Ta vnos vsebuje 3 datotek(e) (1.89
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Noncommercial Noncommercial](/repository/xmlui/themes/UFAL/images/licenses/nc.png)
![No Derivative Works No Derivative Works](/repository/xmlui/themes/UFAL/images/licenses/nd.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)