The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0

Name: The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0
License: https://creativecommons.org/licenses/by-sa/4.0/

Ljubešić, Nikola; Štefanec, Vanja

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0

CLARIN.SI data & tools

Authors: Ljubešić, Nikola and Štefanec, Vanja

Item identifier: http://hdl.handle.net/11356/1331

Project URL: https://github.com/clarinsi/classla-stanfordnlp

Referenced by: http://dx.doi.org/10.18653/v1/W19-3704

Date issued: 2020-07-17

Type: toolService

Language(s): Croatian

Description: This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.11.