| dc.contributor.author |
Krsnik, Luka |
| dc.contributor.author |
Dobrovoljc, Kaja |
| dc.contributor.author |
Terčon, Luka |
| dc.date.accessioned |
2024-09-04T09:29:41Z |
| dc.date.available |
2024-09-04T09:29:41Z |
| dc.date.issued |
2024-09-04 |
| dc.identifier.uri |
http://hdl.handle.net/11356/1966 |
| dc.description |
This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank of spoken Slovenian (UD v2.15, https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/dev) featuring transcriptions of spontaneous speech in various everyday settings.
It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological feature prediction, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/).
Please note this model has been published for archiving purposes only. For production use, we recommend using the state-of-the art Trankit model available here: http://hdl.handle.net/11356/1965. The latter was trained on both spoken (SST) and written (SSJ) data, and demonstrates a significantly higher performance to the model featured in this submission. |
| dc.language.iso |
slv |
| dc.publisher |
Centre for Language Resources and Technologies, University of Ljubljana |
| dc.relation.isreferencedby |
https://arxiv.org/pdf/2101.03289.pdf |
| dc.relation.isreplacedby |
http://hdl.handle.net/11356/1965 |
| dc.rights |
Apache License 2.0 |
| dc.rights.uri |
https://opensource.org/licenses/Apache-2.0 |
| dc.rights.label |
PUB |
| dc.source.uri |
https://github.com/clarinsi/trankit-train |
| dc.subject |
language model |
| dc.subject |
lemmatisation |
| dc.subject |
tokenisation |
| dc.subject |
sentence segmentation |
| dc.subject |
part-of-speech tagging |
| dc.subject |
feature prediction |
| dc.subject |
parsing |
| dc.subject |
dependency parsing |
| dc.subject |
corpus annotation |
| dc.title |
Trankit model for SST 2.15 |
| dc.type |
toolService |
| metashare.ResourceInfo#ContentInfo.detailedType |
tool |
| metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent |
true |
| hidden |
hidden |
| has.files |
yes |
| branding |
CLARIN.SI data & tools |
| contact.person |
Luka Krsnik krsnik.luka92@gmail.com Luka Krsnik |
| contact.person |
Kaja Dobrovoljc kaja.dobrovoljc@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
| sponsor |
ARRS (Slovenian Research Agency) Z6-4617 Treebank-Driven Approach to the Study of Spoken Slovenian nationalFunds |
| sponsor |
ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
| files.count |
1 |
| files.size |
145347815 |