CLARIN.SI repository

What's New

corpus

Monitor corpus of Slovene Trendi 2025-05

Author(s):

Kosem, Iztok ; et al.show everyone

Kosem, Iztok ; Čibej, Jaka ; Dobrovoljc, Kaja ; Erjavec, Tomaž ; Ljubešić, Nikola ; Ponikvar, Primož ; Šinkec, Mihael ; Krek, Simon

Description:

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-05 covers the period from January 2019 to May 2025, complementing the Gigafida ...

This item contains no files.

lexicalConceptualResource

CLARIN.SI data & tools

Syntactic Tree Inventories from English GUM UD Corpus (v2.15)

Author(s):

Dobrovoljc, Kaja

Description:

This dataset contains lists of delexicalized dependency trees and subtrees extracted from the English UD GUM corpus, version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool (https://github.com/clarinsi/STARK). ...

This item contains 6 files (42.39 MB).

Publicly Available Distributed under Creative Commons

lexicalConceptualResource

CLARIN.SI data & tools

Syntactic Tree Inventories from Slovenian UD Corpora (v2.15)

Author(s):

Dobrovoljc, Kaja

Description:

This dataset contains lists of delexicalized dependency trees and subtrees extracted from the Slovenian UD corpora SSJ (written) and SST (spoken), version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool ...

This item contains 6 files (74.12 MB).

Publicly Available Distributed under Creative Commons

Most Viewed Items

Top Last Week

corpus

CLARIN.SI data & tools

Multilingual comparable corpora of parliamentary debates ParlaMint 4.1

Author(s):

Erjavec, Tomaž ; et al.show everyone

Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Roberts ; de Libano, Ruben ; Depoorter, Griet ; Diwersy, Sascha ; Dodé, Réka ; Fernandez, Kike ; Fernández Rei, Elisa ; Frontini, Francesca ; Garcia, Marcos ; García Díaz, Noelia ; García Louzao, Pedro ; Gavriilidou, Maria ; Gkoumas, Dimitris ; Grigorov, Ilko ; Grigorova, Vladislava ; Haltrup Hansen, Dorte ; Iruskieta, Mikel ; Jarlbrink, Johan ; Jelencsik-Mátyus, Kinga ; Jongejan, Bart ; Kahusk, Neeme ; Kirnbauer, Martin ; Kryvenko, Anna ; Ligeti-Nagy, Noémi ; Ljubešić, Nikola ; Luxardo, Giancarlo ; Magariños, Carmen ; Magnusson, Måns ; Marchetti, Carlo ; Marx, Maarten ; Meden, Katja ; Mendes, Amália ; Mochtak, Michal ; Mölder, Martin ; Montemagni, Simonetta ; Navarretta, Costanza ; Nitoń, Bartłomiej ; Norén, Fredrik Mohammadi ; Nwadukwe, Amanda ; Ojsteršek, Mihael ; Pančur, Andrej ; Papavassiliou, Vassilis ; Pereira, Rui ; Pérez Lago, María ; Piperidis, Stelios ; Pirker, Hannes ; Pisani, Marilina ; Pol, Henk van der ; Prokopidis, Prokopis ; Quochi, Valeria ; Rayson, Paul ; Regueira, Xosé Luís ; Rii, Andriana ; Rudolf, Michał ; Ruisi, Manuela ; Rupnik, Peter ; Schopper, Daniel ; Simov, Kiril ; Sinikallio, Laura ; Skubic, Jure ; Tungland, Lars Magne ; Tuominen, Jouni ; van Heusden, Ruben ; Varga, Zsófia ; Vázquez Abuín, Marta ; Venturi, Giulia ; Vidal Miguéns, Adrián ; Vider, Kadri ; Vivel Couso, Ainhoa ; Vladu, Adina Ioana ; Wissik, Tanja ; Yrjänäinen, Väinö ; Zevallos, Rodolfo ; Fišer, Darja

Description:

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...

This item contains 30 files (5.87 GB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

Ekspress user comment dataset 1.0

Author(s):

Shekhar, Ravi ; Pollak, Senja ; Pelicon, Andraž ; Matthew, Purver and Krustok, Ivar

Description:

This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There ...

This item contains 12 files (9.95 GB).

Publicly Available Distributed under Creative Commons

lexicalConceptualResource

CLARIN.SI data & tools

Emoji Sentiment Ranking 1.0

Author(s):

Kralj Novak, Petra ; Smailović, Jasmina ; Sluban, Borut and Mozetič, Igor

Description:

A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ...

This item contains 3 files (93.95 KB).

Publicly Available Distributed under Creative Commons

Linguistic Data and NLP Tools

Find

Citation Support (with Persistent IDs)

Deposit Free and Safe

License of your Choice (Open licenses encouraged)

Easy to Find

Easy to Cite

What's New

Most Viewed Items

Partners

Partners

Repository