Prikaži enostavni zapis vnosa

 
dc.contributor.author Terčon, Luka
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2023-05-16T06:54:07Z
dc.date.available 2023-05-16T06:54:07Z
dc.date.issued 2023-05-10
dc.identifier.uri http://hdl.handle.net/11356/1836
dc.description The model for UD dependency parsing of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the UD-parsed portion of the hr500k training corpus (http://hdl.handle.net/11356/1792) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1790). The estimated LAS of the parser is ~87.46. The difference to the previous version of the model is that this version was trained using the new version of the hr500k corpus and the new version of the Croatian word embeddings.
dc.language.iso hrv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://aclanthology.org/W19-3704/
dc.relation.replaces http://hdl.handle.net/11356/1259
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/classla
dc.subject parsing
dc.subject language model
dc.title The CLASSLA-Stanza model for UD dependency parsing of standard Croatian 2.1
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nljubesi@gmail.com Nikola Ljubešić
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
contact.person Luka Terčon luka.tercon@gmail.com Faculty of Computer and Information Science, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
sponsor Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other
files.count 2
files.size 201122503


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (191.81 MB)
Icon
Ime
baseline_depparse.zip
Velikost
86.46 MB
Format
application/zip
Opis
Language model
MD5
6eb063786c4cddbcf2971e482361d87e
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • baseline_depparse95 MB
Icon
Ime
hr_set.pretrain.zip
Velikost
105.34 MB
Format
application/zip
Opis
Pretrained word embeddings
MD5
17486b571b090f6cc3c2467526507f35
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • hr_set.pretrain.pt150 MB

Prikaži enostavni zapis vnosa