What's New

 corpus 
corpus
Description:
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training continuous speech recognition for Slovene ...
 This item contains 3 files (21.65 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
Gos is a corpus of spoken Slovene that includes the transcripts of approximately 120 hours of speech recorded in various situations: radio and TV shows, school lessons and lectures, private conversations between friends ...
 This item contains 2 files (22.1 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels ...
 This item contains 1 file (182.04 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 This item contains 16 files (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike