What's New
corpus
Description:
The Berta Spoken Corpus contains six hours of recorded speech across a variety of interactional settings. These settings include 57 different speech events, with some captured on video and others, such as telephone or ...
This item contains 4 files (5.62
GB).
Publicly Available
corpus
Description:
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter ...
This item contains 1 file (1.7
MB).
Publicly Available
corpus
Description:
The genre-enriched MaCoCu-Genre corpus collection comprises web corpora that have been automatically annotated with genre labels. The corpora can be very useful for genre-based creation of subcorpora that can be used for ...
This item contains 14 files (101.43
GB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
corpus
Description:
The CVET corpus contains 230 texts (around 175 thousand words) of varying length, published in the religious journal "Cvetje z vertov sv. Frančiška" between 1887 and 1916, when the magazine was edited by the linguist Fr. ...
This item contains 4 files (15.02
MB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (2.17
GB).
Publicly Available