The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1

Name: The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1
License: https://creativecommons.org/licenses/by-sa/4.0/

Ljubešić, Nikola

Prikaži enostavni zapis vnosa

dc.contributor.author	Ljubešić, Nikola
dc.date.accessioned	2020-04-29T10:03:09Z
dc.date.available	2020-04-29T10:03:09Z
dc.date.issued	2020-04-29
dc.identifier.uri	http://hdl.handle.net/11356/1312
dc.description	This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
dc.language.iso	slv
dc.publisher	Jožef Stefan Institute
dc.relation.isreferencedby	https://www.aclweb.org/anthology/W19-3704/
dc.relation.replaces	http://hdl.handle.net/11356/1251
dc.relation.isreplacedby	http://hdl.handle.net/11356/1391
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://github.com/clarinsi/classla-stanfordnlp
dc.subject	part-of-speech tagging
dc.subject	language model
dc.title	The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
hidden	hidden
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor	ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
sponsor	ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
files.count	2
files.size	1683724764