Prikaži enostavni zapis vnosa

 
dc.contributor.author Čibej, Jaka
dc.date.accessioned 2025-08-21T09:34:34Z
dc.date.available 2025-08-21T09:34:34Z
dc.date.issued 2024-12-19
dc.identifier.uri http://hdl.handle.net/11356/2003
dc.description SNES (Stalno naglašene enote iz Sloleksa; Constantly accentuated units from Sloleks) is a dataset containing Slovene final accentuated word parts (i.e., the ending part of an accentuated word from its last grapheme with an accentuation diacritic to the end of the word; for instance, -álnik for "računálnik", -úlja for "hodúlja") that have been automatically extracted from the accentuated forms of the approximately 100,800 manually validated lexemes of Sloleks 3.0 (http://hdl.handle.net/11356/1745). The extracted parts were then manually categorized to compile a manually validated machine-readable list of final accentuated word parts that are always or almost always accentuated in Slovene (e.g. -álnik, -ílnik). Only accentuated word parts that are accentuated in at least 80% of examples were included in the manual list. The list can be used as a resource in post-processing to correct some of the errors in the output of Slovene accentuation models. Version 1.0 includes 24,188 automatically extracted final accentuated word parts, 1,013 of which have been manually validated, categorized, and included in a separate manual list of Slovene final word parts that are always or very frequently accentuated. For more details on the structure of the files, please consult 00README.txt.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Faculty of Arts, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://mezzanine.um.si/
dc.subject accentuated units
dc.subject spoken Slovene
dc.subject accentuation
dc.title Lists of Slovene accentuated units SNES 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARIS (Slovenian Research and Innovation Agency) J7-4642 MEZZANINE nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 24188 entries
files.count 1
files.size 538148


 Datoteke v tem vnosu

Icon
Ime
SNES_1.0.zip
Velikost
525.54 KB
Format
application/zip
Opis
SNES 1.0 (TSV format)
MD5
34a10b97dd057bfcccf27c422414233e
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • SNES_1.0
    • SNES_1.0_automatic.tsv1 MB
    • SNES_1.0_manual.tsv164 kB
    • 00README.txt6 kB

Prikaži enostavni zapis vnosa