What's New
corpus
Description:
TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor.
It contains 2,138 Slovene and 2,221 English instances in tab-separated tabular format .tsv, where each ...
Ta vnos vsebuje 2 datotek(e) (228.99
KB).
Publicly Available
lexicalConceptualResource
Description:
ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian, https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), CEFR ...
Ta vnos vsebuje 2 datotek(e) (60.29
KB).
Publicly Available
lexicalConceptualResource
Description:
The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar to synonym in meaning and/or form. Headwords ...
Ta vnos vsebuje 1 datoteko (829.85
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
Ta vnos vsebuje 7 datotek(e) (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
lexicalConceptualResource
Description:
The lexicon contains manual translations of the NRC Emotion Lexicon (http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm) that encodes the sentiment of a word (positive, negative) and its emotion association (anger, ...
Ta vnos vsebuje 1 datoteko (199.85
KB).
Publicly Available