Prikaži enostavni zapis vnosa

 
dc.contributor.author Krsnik, Luka
dc.contributor.author Dobrovoljc, Kaja
dc.date.accessioned 2023-09-30T17:47:09Z
dc.date.available 2023-09-30T17:47:09Z
dc.date.issued 2023-09-29
dc.identifier.uri http://hdl.handle.net/11356/1870
dc.description This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, feature prediction, and dependency parsing in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). The model was trained using a dataset published by Universal Dependencies in release 2.12 (https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.12). Due to the larger training dataset compared to the original Trankit v1.1.1 model, this version yields superior results and achieves state-of-the art parsing performance for Slovenian (https://slobench.cjvt.si/leaderboard/view/11). To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://arxiv.org/pdf/2101.03289.pdf
dc.relation.isreplacedby http://hdl.handle.net/11356/1963
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/trankit-train
dc.subject language model
dc.subject lemmatisation
dc.subject tokenisation
dc.subject sentence segmentation
dc.subject part-of-speech tagging
dc.subject feature prediction
dc.subject parsing
dc.title The Trankit model for linguistic processing of standard Slovenian
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Luka Krsnik krsnik.luka92@gmail.com Luka Krsnik
contact.person Kaja Dobrovoljc kaja.dobrovoljc@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) Z6-4617 A Treebank-Driven Approach to the Study of Spoken Slovenian nationalFunds
files.count 1
files.size 149893584


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Apache License 2.0
Icon
Ime
save_dir_ssj.zip
Velikost
142.95 MB
Format
application/zip
Opis
Language model
MD5
82631e6e8d6ccc5d30b648d223d71140
 Prenesi datoteko  Predogled
 Predogled datoteke  

Prikaži enostavni zapis vnosa