What's New

 corpus 
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-12 covers the period from January 2019 to December 2024, complementing the ...
 This item contains no files.
 corpus 
corpus
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
 This item contains 2 files (1.33 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 lexicalConceptualResource 
lexicalConceptualResource
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
 This item contains 3 files (11.43 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
 This item contains 3 files (1.89 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial No Derivative Works
 corpus 
corpus
Author(s):
Erjavec, Tomaž ; et al.show everyone Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Roberts ; de Libano, Ruben ; Depoorter, Griet ; Diwersy, Sascha ; Dodé, Réka ; Fernandez, Kike ; Fernández Rei, Elisa ; Frontini, Francesca ; Garcia, Marcos ; García Díaz, Noelia ; García Louzao, Pedro ; Gavriilidou, Maria ; Gkoumas, Dimitris ; Grigorov, Ilko ; Grigorova, Vladislava ; Haltrup Hansen, Dorte ; Iruskieta, Mikel ; Jarlbrink, Johan ; Jelencsik-Mátyus, Kinga ; Jongejan, Bart ; Kahusk, Neeme ; Kirnbauer, Martin ; Kryvenko, Anna ; Ligeti-Nagy, Noémi ; Ljubešić, Nikola ; Luxardo, Giancarlo ; Magariños, Carmen ; Magnusson, Måns ; Marchetti, Carlo ; Marx, Maarten ; Meden, Katja ; Mendes, Amália ; Mochtak, Michal ; Mölder, Martin ; Montemagni, Simonetta ; Navarretta, Costanza ; Nitoń, Bartłomiej ; Norén, Fredrik Mohammadi ; Nwadukwe, Amanda ; Ojsteršek, Mihael ; Pančur, Andrej ; Papavassiliou, Vassilis ; Pereira, Rui ; Pérez Lago, María ; Piperidis, Stelios ; Pirker, Hannes ; Pisani, Marilina ; Pol, Henk van der ; Prokopidis, Prokopis ; Quochi, Valeria ; Rayson, Paul ; Regueira, Xosé Luís ; Rii, Andriana ; Rudolf, Michał ; Ruisi, Manuela ; Rupnik, Peter ; Schopper, Daniel ; Simov, Kiril ; Sinikallio, Laura ; Skubic, Jure ; Tungland, Lars Magne ; Tuominen, Jouni ; van Heusden, Ruben ; Varga, Zsófia ; Vázquez Abuín, Marta ; Venturi, Giulia ; Vidal Miguéns, Adrián ; Vider, Kadri ; Vivel Couso, Ainhoa ; Vladu, Adina Ioana ; Wissik, Tanja ; Yrjänäinen, Väinö ; Zevallos, Rodolfo ; Fišer, Darja
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
 This item contains 30 files (5.87 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required