What's New
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 61 publishers. Trendi 2026-04 covers the period from January 2019 to April 2026, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
corpus
Description:
This entry contains the first part of the audiobook "O dedku in medvedku" (About grandpa and the little bear) by author Leopold Suhodolčan (COBISS ID: 269241603, ISBN: 978-961-7194-52-4).
There once was a little girl ...
Ta vnos vsebuje 1 datoteko (10.62
MB).
Publicly Available
corpus
Description:
This entry contains the first part of the audiobook "Pikapolonček" (The little ladybird) by author Leopold Suhodolčan (COBISS ID: 274672899, ISBN: 978-961-7194-63-0).
Ta vnos vsebuje 1 datoteko (16.48
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
SloIE is a manually labelled dataset of Slovene idiomatic expressions. It contains 29,400 sentences with 75 different expressions that can occur with either a literal or an idiomatic meaning, with appropriate manual ...
Ta vnos vsebuje 1 datoteko (4.22
MB).
Publicly Available
corpus
Description:
The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels ...
Ta vnos vsebuje 1 datoteko (182.04
MB).
Publicly Available
corpus
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
Ta vnos vsebuje 3 datotek(e) (1.89
GB).
Publicly Available