dc.contributor.author | Čibej, Jaka |
dc.date.accessioned | 2025-08-21T09:34:34Z |
dc.date.available | 2025-08-21T09:34:34Z |
dc.date.issued | 2024-12-19 |
dc.identifier.uri | http://hdl.handle.net/11356/2003 |
dc.description | SNES (Stalno naglašene enote iz Sloleksa; Constantly accentuated units from Sloleks) is a dataset containing Slovene final accentuated word parts (i.e., the ending part of an accentuated word from its last grapheme with an accentuation diacritic to the end of the word; for instance, -álnik for "računálnik", -úlja for "hodúlja") that have been automatically extracted from the accentuated forms of the approximately 100,800 manually validated lexemes of Sloleks 3.0 (http://hdl.handle.net/11356/1745). The extracted parts were then manually categorized to compile a manually validated machine-readable list of final accentuated word parts that are always or almost always accentuated in Slovene (e.g. -álnik, -ílnik). Only accentuated word parts that are accentuated in at least 80% of examples were included in the manual list. The list can be used as a resource in post-processing to correct some of the errors in the output of Slovene accentuation models. Version 1.0 includes 24,188 automatically extracted final accentuated word parts, 1,013 of which have been manually validated, categorized, and included in a separate manual list of Slovene final word parts that are always or very frequently accentuated. For more details on the structure of the files, please consult 00README.txt. |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Faculty of Arts, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://mezzanine.um.si/ |
dc.subject | accentuated units |
dc.subject | spoken Slovene |
dc.subject | accentuation |
dc.title | Lists of Slovene accentuated units SNES 1.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
sponsor | ARIS (Slovenian Research and Innovation Agency) J7-4642 MEZZANINE nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
size.info | 24188 entries |
files.count | 1 |
files.size | 538148 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- SNES_1.0.zip
- Size
- 525.54 KB
- Format
- application/zip
- Description
- SNES 1.0 (TSV format)
- MD5
- 34a10b97dd057bfcccf27c422414233e
- SNES_1.0
- SNES_1.0_automatic.tsv1 MB
- SNES_1.0_manual.tsv164 kB
- 00README.txt6 kB