The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1

Name: The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1
License: https://creativecommons.org/licenses/by-sa/4.0/

Ljubešić, Nikola

Show simple item record

dc.contributor.author	Ljubešić, Nikola
dc.date.accessioned	2020-09-11T11:50:00Z
dc.date.available	2020-09-11T11:50:00Z
dc.date.issued	2020-09-11
dc.identifier.uri	http://hdl.handle.net/11356/1348
dc.description	The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.1. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
dc.language.iso	hrv
dc.publisher	Jožef Stefan Institute
dc.relation.isreferencedby	http://dx.doi.org/10.18653/v1/W19-3704
dc.relation.replaces	http://hdl.handle.net/11356/1252
dc.relation.isreplacedby	http://hdl.handle.net/11356/1393
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://github.com/clarinsi/classla-stanfordnlp
dc.subject	language model
dc.subject	part-of-speech tagging
dc.title	The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
hidden	hidden
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor	ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
sponsor	ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
files.count	2
files.size	854266899