Najnovejše
languageDescription

Opis:
ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and ...
Ta vnos vsebuje 7 datotek(e) (1.35
GB).
Publicly Available
lexicalConceptualResource

Opis:
Word analogy task evaluates word embeddings, based on analagous word pairs (eg. "Paris - France" should be equivalent to "Rome - Italy", "son - daughter" should be equivalent to "brother - sister"). The dataset has been ...
Ta vnos vsebuje 3 datotek(e) (6.08
MB).
Publicly Available



toolService

Opis:
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that ...
Ta vnos vsebuje 1 datoteko (16.26
MB).
Publicly Available
Največ ogledov
V preteklem tednu
lexicalConceptualResource

Opis:
srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the ...
Ta vnos vsebuje 1 datoteko (29.54
MB).
Publicly Available
lexicalConceptualResource

Opis:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
Ta vnos vsebuje 3 datotek(e) (93.95
KB).
Publicly Available



corpus

Opis:
The resource consists of two datasets related to Members of the 8th European Parliament (MEPs). The first one is a dataset of 2,535 roll-call votes of MEPs until 2016-03-01. The second one is a dataset of 26,133 retweets ...
Ta vnos vsebuje 6 datotek(e) (12.46
MB).
Publicly Available


