Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2020-09-11T11:50:10Z
dc.date.available 2020-09-11T11:50:10Z
dc.date.issued 2020-09-11
dc.identifier.uri http://hdl.handle.net/11356/1349
dc.description The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.2. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://dx.doi.org/10.18653/v1/W19-3704
dc.relation.replaces http://hdl.handle.net/11356/1253
dc.relation.isreplacedby http://hdl.handle.net/11356/1392
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/classla-stanfordnlp
dc.subject language model
dc.subject part-of-speech tagging
dc.title The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.1
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
files.count 2
files.size 452077153


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (431.13 MB)
Icon
Ime
SETimes.SR
Velikost
59.44 MB
Format
Neznano
Opis
Language model
MD5
03da180987d63489e1181aa4570da364
 Prenesi datoteko
Icon
Ime
SETimes.SR.pretrain.pt.zip
Velikost
371.7 MB
Format
application/zip
Opis
Pretrained word embeddings
MD5
42d7e2947cb0c2077c6775d6d0b93b57
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • SETimes.SR.pretrain.pt530 MB

Prikaži enostavni zapis vnosa