What's New

 corpus 
corpus
Author(s):
Description:
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter ...
 Ta vnos vsebuje 1 datoteko (1.7 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial
 corpus 
corpus
Description:
The genre-enriched MaCoCu-Genre corpus collection comprises web corpora that have been automatically annotated with genre labels. The corpora can be very useful for genre-based creation of subcorpora that can be used for ...
 Ta vnos vsebuje 14 datotek(e) (101.43 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-08 covers the period from January 2019 to September 2024, complementing the ...
 Ta vnos ne vsebuje datotek.

Največ ogledov

V preteklem tednu
 corpus 
corpus
Description:
The Montenegrin web corpus MaCoCu-cnr 1.0 was built by crawling the ".me" internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.c ...
 Ta vnos vsebuje 2 datotek(e) (500.14 MB).
 
Publicly Available
 corpus 
corpus
Description:
SI-NLI (Slovene Natural Language Inference Dataset) contains 5,937 human-created Slovene sentence pairs (premise and hypothesis) that are manually labeled with the labels "entailment", "contradiction", and "neutral". We ...
 Ta vnos vsebuje 1 datoteko (400.48 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Author(s):
Erjavec, Tomaž ; et al.prikaži vse Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Roberts ; de Libano, Ruben ; Depoorter, Griet ; Diwersy, Sascha ; Dodé, Réka ; Fernandez, Kike ; Fernández Rei, Elisa ; Frontini, Francesca ; Garcia, Marcos ; García Díaz, Noelia ; García Louzao, Pedro ; Gavriilidou, Maria ; Gkoumas, Dimitris ; Grigorov, Ilko ; Grigorova, Vladislava ; Haltrup Hansen, Dorte ; Iruskieta, Mikel ; Jarlbrink, Johan ; Jelencsik-Mátyus, Kinga ; Jongejan, Bart ; Kahusk, Neeme ; Kirnbauer, Martin ; Kryvenko, Anna ; Ligeti-Nagy, Noémi ; Ljubešić, Nikola ; Luxardo, Giancarlo ; Magariños, Carmen ; Magnusson, Måns ; Marchetti, Carlo ; Marx, Maarten ; Meden, Katja ; Mendes, Amália ; Mochtak, Michal ; Mölder, Martin ; Montemagni, Simonetta ; Navarretta, Costanza ; Nitoń, Bartłomiej ; Norén, Fredrik Mohammadi ; Nwadukwe, Amanda ; Ojsteršek, Mihael ; Pančur, Andrej ; Papavassiliou, Vassilis ; Pereira, Rui ; Pérez Lago, María ; Piperidis, Stelios ; Pirker, Hannes ; Pisani, Marilina ; Pol, Henk van der ; Prokopidis, Prokopis ; Quochi, Valeria ; Rayson, Paul ; Regueira, Xosé Luís ; Rii, Andriana ; Rudolf, Michał ; Ruisi, Manuela ; Rupnik, Peter ; Schopper, Daniel ; Simov, Kiril ; Sinikallio, Laura ; Skubic, Jure ; Tungland, Lars Magne ; Tuominen, Jouni ; van Heusden, Ruben ; Varga, Zsófia ; Vázquez Abuín, Marta ; Venturi, Giulia ; Vidal Miguéns, Adrián ; Vider, Kadri ; Vivel Couso, Ainhoa ; Vladu, Adina Ioana ; Wissik, Tanja ; Yrjänäinen, Väinö ; Zevallos, Rodolfo ; Fišer, Darja
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
 Ta vnos vsebuje 30 datotek(e) (5.87 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required