What's New
corpus
Description:
Submission includes the first part of the audiobook: Svetišča narave (Nature sanctuaries) by author Irena Cerar (COBISS.ID: 275565059, ISBN: 978-961-291-542-1).
With the book Nature sanctuaries, Irena Cerar, the author ...
This item contains 4 files (161.37
MB).
Publicly Available
corpus
Description:
The Disasters corpus in classical Arabic sources (DiCCAS) is designed to allow historians to compare different accounts and narratives of disasters in a variety of classical Arabic sources.
The corpus encompasses a ...
This item contains 7 files (20.97
MB).
Publicly Available
corpus
Description:
This entry contains the first part of the audiobook "Cvetoča Slovenija" (Blooming Slovenia) by author Ivan Sivec (COBISS.ID: 275424259, ISBN: 978-961-291-539-1).
Many believe that the writer Ivan Sivec is one of the ...
This item contains 3 files (102
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
This item contains 2 files (8.9
MB).
Publicly Available
toolService
Description:
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that ...
This item contains 1 file (231.07
MB).
Publicly Available