What's New
corpus
Description:
This entry contains the first part of the audiobook "Izgubljeni Prešeren" (Lost Prešeren) by author Ivan Sivec (COBISS ID: 276089603, ISBN: 978-961-7143-64-5).
The story begins when Tina, as a university student, helps ...
Ta vnos vsebuje 3 datotek(e) (79.52
MB).
Publicly Available
corpus
Description:
This entry includes the first part of the e-book "Okupacija" (Occupation) by author Gal Prevoršek (COBISS.SI-ID 275187459, ISBN 978-961-7272-63-5).
Ta vnos vsebuje 1 datoteko (590.3
KB).
Publicly Available
corpus
Description:
This entry includes the first part of the e-book "Šlagerji" (Hits) by author Feri Lainšček (COBISS.SI-ID 275166467, ISBN 978-961-7272-62-8).
The book brings an extensive selection of Lainšček's poetic texts set to music, ...
Ta vnos vsebuje 1 datoteko (4.14
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic ...
Ta vnos vsebuje 7 datotek(e) (3.83
MB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
Ta vnos vsebuje 2 datotek(e) (8.9
MB).
Publicly Available