What's New
toolService

Description:
This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC fine-tuning recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/de ...
Ta vnos vsebuje 1 datoteko (430.87
MB).
Publicly Available
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 56 publishers. Trendi 2025-03 covers the period from January 2019 to March 2025, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
corpus

Description:
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.2 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, ...
Ta vnos vsebuje 1 datoteko (11.08
MB).
Publicly Available



Največ ogledov
V preteklem tednu
corpus

Description:
The Montenegrin web corpus meWaC was built by crawling the .me top-level domain in 2019. The corpus was near-deduplicated on paragraph level, normalised via transliteration into the Latin script, and morphosyntactically ...
Ta vnos vsebuje 2 datotek(e) (2.47
GB).
Publicly Available



corpus

Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available


corpus

Description:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
Ta vnos vsebuje 1 datoteko (14.12
MB).
Academic Use

