Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Terčon, Luka
dc.contributor.author Čibej, Jaka
dc.date.accessioned 2023-02-03T16:55:03Z
dc.date.available 2023-02-03T16:55:03Z
dc.date.issued 2023-01-31
dc.identifier.uri http://hdl.handle.net/11356/1767
dc.description This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~98.27. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://aclanthology.org/W19-3704/
dc.relation.replaces http://hdl.handle.net/11356/1476
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/classla
dc.subject part-of-speech tagging
dc.subject language model
dc.title The CLASSLA-Stanza model for morphosyntactic annotation of standard Slovenian 2.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
contact.person Luka Terčon luka.tercon@gmail.com Faculty of Computer and Information Science, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 2
files.size 534642041


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (509.87 MB)
Icon
Ime
sl_ssj.pretrain.zip
Velikost
104.71 MB
Format
application/zip
Opis
Pretrained word embeddings
MD5
9d417487f321b83e9fc1d64c2f89f2dd
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • sl_ssj.pretrain.pt149 MB
Icon
Ime
suk_pos.zip
Velikost
405.16 MB
Format
application/zip
Opis
Language model
MD5
7eac894294f67589b85e067a6539c846
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • suk_pos1 GB

Prikaži enostavni zapis vnosa