What's New
corpus
Description:
This entry contains the first part of the audiobook "En korak, en utrip srca" (One step, one heartbeat) by author Leopold Suhodolčan (COBISS ID: 277539843, ISBN: 978-961-291-545-2).
Recreational marathon runner Samo ...
This item contains 3 files (140.25
MB).
Publicly Available
corpus
Description:
This entry contains the first part of the audiobook "Cesar Arnulf" (Emperor Arnulf) by author Leopold Suhodolčan (COBISS ID: 277489667, ISBN: 978-961-291-548-39).
This item contains 8 files (150.29
MB).
Publicly Available
corpus
Description:
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 2.0 contains subcorpora with sentences for 17 languages: Bulgarian, Danish, ...
This item contains 1 file (14.08
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available
corpus
Description:
SloIE is a manually labelled dataset of Slovene idiomatic expressions. It contains 29,400 sentences with 75 different expressions that can occur with either a literal or an idiomatic meaning, with appropriate manual ...
This item contains 1 file (4.22
MB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
This item contains 2 files (8.9
MB).
Publicly Available