What's New

 corpus 
corpus
Description:
The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels ...
 This item contains 1 file (182.04 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 toolService 
toolService
Author(s):
Description:
The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various ...
 This item contains 1 file (1.11 MB).
 
Publicly Available
 corpus 
corpus
Description:
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training continuous speech recognition for Slovene ...
 This item contains 3 files (20.74 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The corpus consists of texts produced by nonprofessional typical speakers and speakers with different language disorders (developmental language disorder, dyslexia, traumatic brain injury, aphasia, other). Roughly half of ...
 This item contains 2 files (8.11 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike