The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.1

Name: The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.1
License: https://creativecommons.org/licenses/by-sa/4.0/

Ljubešić, Nikola

Prikaži enostavni zapis vnosa

dc.contributor.author	Ljubešić, Nikola
dc.date.accessioned	2020-09-11T11:50:10Z
dc.date.available	2020-09-11T11:50:10Z
dc.date.issued	2020-09-11
dc.identifier.uri	http://hdl.handle.net/11356/1349
dc.description	The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.2. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
dc.language.iso	srp
dc.publisher	Jožef Stefan Institute
dc.relation.isreferencedby	http://dx.doi.org/10.18653/v1/W19-3704
dc.relation.replaces	http://hdl.handle.net/11356/1253
dc.relation.isreplacedby	http://hdl.handle.net/11356/1392
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://github.com/clarinsi/classla-stanfordnlp
dc.subject	language model
dc.subject	part-of-speech tagging
dc.title	The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.1
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
hidden	hidden
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor	ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
sponsor	ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
files.count	2
files.size	452077153