What's New

 toolService 
toolService
Description:
This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation ...
 Ta vnos vsebuje 1 datoteko (142.95 MB).
 
Publicly Available
 corpus 
corpus
Description:
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema ...
 Ta vnos vsebuje 8 datotek(e) (7.43 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 toolService 
toolService
Description:
The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled ...
 Ta vnos vsebuje 1 datoteko (31.44 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike

Največ ogledov

V preteklem tednu
 corpus 
corpus
Description:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The ...
 Ta vnos vsebuje 3 datotek(e) (10.91 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
 Ta vnos vsebuje 3 datotek(e) (91.53 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Author(s):
Erjavec, Tomaž ; et al.prikaži vse Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Fišer, Darja ; Pirker, Hannes ; Wissik, Tanja ; Schopper, Daniel ; Kirnbauer, Martin ; Ljubešić, Nikola ; Rupnik, Peter ; Mochtak, Michal ; Pol, Henk van der ; Depoorter, Griet ; Simov, Kiril ; Grigorova, Vladislava ; Grigorov, Ilko ; Jongejan, Bart ; Haltrup Hansen, Dorte ; Navarretta, Costanza ; Mölder, Martin ; Kahusk, Neeme ; Vider, Kadri ; Bel, Nuria ; Antiba-Cartazo, Iván ; Pisani, Marilina ; Zevallos, Rodolfo ; Vladu, Adina Ioana ; Magariños, Carmen ; Bardanca, Daniel ; Barcala, Mario ; Garcia, Marcos ; Pérez Lago, María ; García Louzao, Pedro ; Vivel Couso, Ainhoa ; Vázquez Abuín, Marta ; García Díaz, Noelia ; Vidal Miguéns, Adrián ; Fernández Rei, Elisa ; Regueira, Xosé Luís ; Diwersy, Sascha ; Luxardo, Giancarlo ; Coole, Matthew ; Rayson, Paul ; Nwadukwe, Amanda ; Gkoumas, Dimitris ; Papavassiliou, Vassilis ; Prokopidis, Prokopis ; Gavriilidou, Maria ; Piperidis, Stelios ; Ligeti-Nagy, Noémi ; Jelencsik-Mátyus, Kinga ; Varga, Zsófia ; Dodé, Réka ; Barkarson, Starkaður ; Agnoloni, Tommaso ; Bartolini, Roberto ; Frontini, Francesca ; Montemagni, Simonetta ; Quochi, Valeria ; Venturi, Giulia ; Ruisi, Manuela ; Marchetti, Carlo ; Battistoni, Roberto ; Darģis, Roberts ; van Heusden, Ruben ; Marx, Maarten ; Tungland, Lars Magne ; Rudolf, Michał ; Nitoń, Bartłomiej ; Aires, José ; Mendes, Amália ; Cardoso, Aida ; Pereira, Rui ; Yrjänäinen, Väinö ; Norén, Fredrik Mohammadi ; Magnusson, Måns ; Jarlbrink, Johan ; Meden, Katja ; Pančur, Andrej ; Ojsteršek, Mihael ; Çöltekin, Çağrı ; Kryvenko, Anna
Description:
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size. The ...
 Ta vnos vsebuje 27 datotek(e) (5.22 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required