CLARIN.SI repository

What's New

corpus

The Sarajevo Corpus of SMS Messages in Bosnian 1.1

Author(s):

Wasserscheidt, Philipp ; et al.show everyone

Wasserscheidt, Philipp ; Bulić, Halid ; Durmišević, Elma ; Hodžić-Čavkić, Azra ; Bajraktarević, Enisa ; Ahmetspahić-Peljto, Azra ; Šabić, Belmin

Description:

This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022. The SMS messages included in this corpus were obtained from voluntary donors (informants). Both ...

This item contains 1 file (1.69 MB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

Albanian Spoken Corpus in Kosovo 1.0

Author(s):

Wasserscheidt, Philipp ; Rugova, Bardh and Baftiu, Adelajda

Description:

This is the third version of a spoken corpus of Albanian in Kosovo. The data of the corpus is based on short life stories of 212 informants out of sample of 1800 speakers balanced across all regions of Kosovo and the ...

This item contains 1 file (1.76 MB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

Monitor corpus of Slovene Trendi 2024-06

Author(s):

Kosem, Iztok ; et al.show everyone

Kosem, Iztok ; Čibej, Jaka ; Dobrovoljc, Kaja ; Erjavec, Tomaž ; Ljubešić, Nikola ; Ponikvar, Primož ; Šinkec, Mihael ; Krek, Simon

Description:

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 74 publishers. Trendi 2024-06 covers the period from January 2019 to June 2024, complementing the Gigafida ...

This item contains no files.

Most Viewed Items

Top Last Week

corpus

CLARIN.SI data & tools

The Sarajevo Corpus of SMS Messages in Bosnian 1.1

Author(s):

Wasserscheidt, Philipp ; et al.show everyone

Wasserscheidt, Philipp ; Bulić, Halid ; Durmišević, Elma ; Hodžić-Čavkić, Azra ; Bajraktarević, Enisa ; Ahmetspahić-Peljto, Azra ; Šabić, Belmin

Description:

This item contains 1 file (1.69 MB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

Multilingual comparable corpora of parliamentary debates ParlaMint 4.1

Author(s):

Erjavec, Tomaž ; et al.show everyone

Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Roberts ; de Libano, Ruben ; Depoorter, Griet ; Diwersy, Sascha ; Dodé, Réka ; Fernandez, Kike ; Fernández Rei, Elisa ; Frontini, Francesca ; Garcia, Marcos ; García Díaz, Noelia ; García Louzao, Pedro ; Gavriilidou, Maria ; Gkoumas, Dimitris ; Grigorov, Ilko ; Grigorova, Vladislava ; Haltrup Hansen, Dorte ; Iruskieta, Mikel ; Jarlbrink, Johan ; Jelencsik-Mátyus, Kinga ; Jongejan, Bart ; Kahusk, Neeme ; Kirnbauer, Martin ; Kryvenko, Anna ; Ligeti-Nagy, Noémi ; Ljubešić, Nikola ; Luxardo, Giancarlo ; Magariños, Carmen ; Magnusson, Måns ; Marchetti, Carlo ; Marx, Maarten ; Meden, Katja ; Mendes, Amália ; Mochtak, Michal ; Mölder, Martin ; Montemagni, Simonetta ; Navarretta, Costanza ; Nitoń, Bartłomiej ; Norén, Fredrik Mohammadi ; Nwadukwe, Amanda ; Ojsteršek, Mihael ; Pančur, Andrej ; Papavassiliou, Vassilis ; Pereira, Rui ; Pérez Lago, María ; Piperidis, Stelios ; Pirker, Hannes ; Pisani, Marilina ; Pol, Henk van der ; Prokopidis, Prokopis ; Quochi, Valeria ; Rayson, Paul ; Regueira, Xosé Luís ; Rii, Andriana ; Rudolf, Michał ; Ruisi, Manuela ; Rupnik, Peter ; Schopper, Daniel ; Simov, Kiril ; Sinikallio, Laura ; Skubic, Jure ; Tungland, Lars Magne ; Tuominen, Jouni ; van Heusden, Ruben ; Varga, Zsófia ; Vázquez Abuín, Marta ; Venturi, Giulia ; Vidal Miguéns, Adrián ; Vider, Kadri ; Vivel Couso, Ainhoa ; Vladu, Adina Ioana ; Wissik, Tanja ; Yrjänäinen, Väinö ; Zevallos, Rodolfo ; Fišer, Darja

Description:

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...

This item contains 30 files (5.87 GB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

24sata news comment dataset 1.0

Author(s):

Shekhar, Ravi ; Pranjic, Marko ; Pollak, Senja ; Pelicon, Andraž and Purver, Matthew

Description:

The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...

This item contains 3 files (1.89 GB).

Publicly Available Distributed under Creative Commons

Linguistic Data and NLP Tools

Find

Citation Support (with Persistent IDs)

Deposit Free and Safe

License of your Choice (Open licenses encouraged)

Easy to Find

Easy to Cite

What's New

Most Viewed Items

Partners

Partners

Repository