What's New
toolService
Description:
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that ...
This item contains 1 file (231.07
MB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 107 media websites, published by 77 publishers. Trendi 2024-08 covers the period from January 2019 to August 2024, complementing the Gigafida ...
This item contains no files.
toolService
Description:
This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation of the SSJ UD treebank of written Slovenian ...
This item contains 1 file (145.44
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
PodzemniRadovi-sr-en, dvojezični poravnati korpus radova iz oblasti rudarstva. Undeground-mining-sr-en: bilingual texts from the Underground Mining Engineering journal (55 papers from 8 issues), aligned at the sentence ...
This item contains no files.
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (2.17
GB).
Publicly Available
corpus
Description:
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.67
GB).
Publicly Available