dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Terčon, Luka |
dc.contributor.author | Čibej, Jaka |
dc.date.accessioned | 2023-02-03T16:55:03Z |
dc.date.available | 2023-02-03T16:55:03Z |
dc.date.issued | 2023-01-31 |
dc.identifier.uri | http://hdl.handle.net/11356/1767 |
dc.description | This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~98.27. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745). |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://aclanthology.org/W19-3704/ |
dc.relation.replaces | http://hdl.handle.net/11356/1476 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/clarinsi/classla |
dc.subject | part-of-speech tagging |
dc.subject | language model |
dc.title | The CLASSLA-Stanza model for morphosyntactic annotation of standard Slovenian 2.0 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
contact.person | Luka Terčon luka.tercon@gmail.com Faculty of Computer and Information Science, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
files.count | 2 |
files.size | 534642041 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (509.87 MB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- sl_ssj.pretrain.zip
- Velikost
- 104.71 MB
- Format
- application/zip
- Opis
- Pretrained word embeddings
- MD5
- 9d417487f321b83e9fc1d64c2f89f2dd

- Ime
- suk_pos.zip
- Velikost
- 405.16 MB
- Format
- application/zip
- Opis
- Language model
- MD5
- 7eac894294f67589b85e067a6539c846