| dc.contributor.author | Čibej, Jaka |
| dc.contributor.author | Gantar, Kaja |
| dc.contributor.author | Dobrovoljc, Kaja |
| dc.contributor.author | Krek, Simon |
| dc.contributor.author | Holozan, Peter |
| dc.contributor.author | Erjavec, Tomaž |
| dc.contributor.author | Romih, Miro |
| dc.contributor.author | Arhar Holdt, Špela |
| dc.contributor.author | Krsnik, Luka |
| dc.contributor.author | Robnik-Šikonja, Marko |
| dc.date.accessioned | 2026-02-06T06:30:21Z |
| dc.date.available | 2026-02-06T06:30:21Z |
| dc.date.issued | 2026-02-03 |
| dc.identifier.uri | http://hdl.handle.net/11356/2080 |
| dc.description | Sloleks is a reference morphological lexicon of Slovene that was developed to be used in various NLP applications and language manuals. It contains Slovene lemmas, their inflected or derivative word forms and the corresponding grammatical description. In addition to the approx. 100,000 entries already available in Sloleks 2.0 (http://hdl.handle.net/11356/1230) and cca. 265,000 newly generated entries from the most frequent lemmas in Gigafida 2.0 (http://hdl.handle.net/11356/1320) included in Sloleks 3.0 (for verbs, adjectives, adverbs, and common nouns, the lemmas were checked manually by three annotators and included in Sloleks only if confirmed as legitimate by at least one annotator. No manual checking was performed on proper nouns), version 3.1 contains an additional file with 7,001 lexemes extracted from various corpora of spoken Slovene (e.g. GOS 1.1 http://hdl.handle.net/11356/1438; GOS-VL 4.2 http://hdl.handle.net/11356/1444; Artur 1.0 http://hdl.handle.net/11356/1772) and transcriptions (not publicly available) of Slovene university lectures used for the Online Notes project (https://www.cjvt.si/online-notes/). Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard. In addition, most entries contain information on their morphological patterns (see http://hdl.handle.net/11356/1411 for more on morphological patterns). Similarly to version 3.0, version 3.1 includes accentuated word forms automatically generated through neural networks (Krsnik 2017) for some lexemes. For the 100,000 entries from Sloleks 2.0, the accentuated forms were manually corrected, whereas the accentuated forms for the other 265,000 entries are fully automatic, with the exception of 7,001 lexemes from spoken corpora (with manually corrected orthography and accentuation forms). IPA and SAMPA phonetic transcriptions were generated automatically using an improved G2P system for Slovene developed within the RSDO project (see https://github.com/clarinsi/slovene_g2p). Version 3.1 is encoded in the same custom XML format used for 3.0 and developed for the morphological lexicon by the Centre for Language Resources and Technologies of the University of Ljubljana (see the included .xsd files and below for details). References: Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/ |
| dc.language.iso | slv |
| dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
| dc.relation.replaces | http://hdl.handle.net/11356/1745 |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://rsdo.slovenscina.eu/en/language-resources |
| dc.subject | morphology |
| dc.subject | inflection |
| dc.subject | word forms |
| dc.subject | derivation |
| dc.subject | lemmatisation |
| dc.subject | word accents |
| dc.subject | IPA |
| dc.subject | SAMPA |
| dc.subject | morphological patterns |
| dc.title | Morphological Lexicon of Slovene Sloleks 3.1 |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
| sponsor | Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other |
| sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
| sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
| sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
| sponsor | ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds |
| size.info | 372341 entries |
| files.count | 1 |
| files.size | 275429063 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- Sloleks.3.1.zip
- Size
- 262.67 MB
- Format
- application/zip
- Description
- Lexicon in XML
- MD5
- 26dc6adcb250d827f94b287201ee9bf7
- Sloleks.3.1
- sloleks_3.1_050.xml100 MB
- sloleks_3.1_028.xml88 MB
- sloleks_3.1_001.xml181 MB
- sloleks_3.1_066.xml153 MB
- sloleks_3.1_017.xml88 MB
- sloleks_3.1_055.xml137 MB
- sloleks_3.1_093.xml96 MB
- sloleks_3.1_100.xml97 MB
- sloleks_3.1_006.xml177 MB
- sloleks_3.1_044.xml114 MB
- sloleks_3.1_082.xml98 MB
- sloleks_3.1_098.xml96 MB
- sloleks_3.1_033.xml100 MB
- sloleks_3.1_071.xml76 MB
- sloleks_3.1_049.xml97 MB
- sloleks_3.1_087.xml96 MB
- sloleks_3.1_022.xml156 MB
- sloleks_3.1_060.xml127 MB
- sloleks_3.1_038.xml99 MB
- sloleks_3.1_076.xml137 MB
- sloleks_3.1_011.xml188 MB
- xml_schemas
- morphological_lexicon.xsd1 kB
- inventory.xsd29 kB
- sloleks_3.1_027.xml81 MB
- sloleks_3.1_065.xml118 MB
- sloleks_3.1_016.xml263 MB
- sloleks_3.1_054.xml97 MB
- sloleks_3.1_092.xml97 MB
- sloleks_3.1_005.xml149 MB
- sloleks_3.1_043.xml103 MB
- sloleks_3.1_081.xml97 MB
- sloleks_3.1_059.xml128 MB
- sloleks_3.1_097.xml97 MB
- sloleks_3.1_032.xml97 MB
- sloleks_3.1_070.xml254 MB
- sloleks_3.1_048.xml96 MB
- sloleks_3.1_086.xml97 MB
- sloleks_3.1_021.xml134 MB
- sloleks_3.1_037.xml100 MB
- sloleks_3.1_075.xml96 MB
- sloleks_3.1_010.xml174 MB
- sloleks_3.1_026.xml147 MB
- sloleks_3.1_064.xml105 MB
- sloleks_3.1_015.xml196 MB
- sloleks_3.1_053.xml141 MB
- sloleks_3.1_091.xml97 MB
- sloleks_3.1_004.xml141 MB
- sloleks_3.1_069.xml135 MB
- sloleks_3.1_042.xml100 MB
- sloleks_3.1_080.xml122 MB
- sloleks_3.1_058.xml140 MB
- sloleks_3.1_096.xml97 MB
- sloleks_3.1_031.xml96 MB
- sloleks_3.1_009.xml135 MB
- sloleks_3.1_047.xml104 MB
- sloleks_3.1_085.xml96 MB
- sloleks_3.1_020.xml196 MB
- sloleks_3.1_036.xml97 MB
- sloleks_3.1_074.xml138 MB
- sloleks_3.1_025.xml172 MB
- sloleks_3.1_063.xml155 MB
- sloleks_3.1_079.xml137 MB
- sloleks_3.1_014.xml105 MB
- sloleks_3.1_052.xml108 MB
- sloleks_3.1_090.xml97 MB
- sloleks_3.1_003.xml178 MB
- sloleks_3.1_068.xml187 MB
- sloleks_3.1_041.xml98 MB
- sloleks_3.1_019.xml174 MB
- sloleks_3.1_057.xml99 MB
- sloleks_3.1_095.xml96 MB
- sloleks_3.1_030.xml97 MB
- sloleks_3.1_102.xml47 MB
- sloleks_3.1_008.xml182 MB
- sloleks_3.1_046.xml101 MB
- sloleks_3.1_084.xml97 MB
- sloleks_3.1_mezzanine.xml135 MB
- sloleks_3.1_035.xml97 MB
- sloleks_3.1_073.xml133 MB
- sloleks_3.1_089.xml97 MB
- sloleks_3.1_024.xml163 MB
- 00README.txt10 kB
- sloleks_3.1_062.xml124 MB
- sloleks_3.1_078.xml100 MB
- sloleks_3.1_013.xml191 MB
- sloleks_3.1_051.xml96 MB
- sloleks_3.1_029.xml98 MB
- sloleks_3.1_067.xml97 MB
- sloleks_3.1_002.xml138 MB
- sloleks_3.1_040.xml96 MB
- sloleks_3.1_018.xml93 MB
- sloleks_3.1_056.xml110 MB
- sloleks_3.1_094.xml97 MB
- sloleks_3.1_101.xml97 MB
- sloleks_3.1_007.xml133 MB
- sloleks_3.1_045.xml96 MB
- sloleks_3.1_083.xml97 MB
- sloleks_3.1_099.xml97 MB
- sloleks_3.1_034.xml98 MB
- sloleks_3.1_072.xml131 MB
- sloleks_3.1_088.xml96 MB
- sloleks_3.1_023.xml125 MB
- sloleks_3.1_061.xml118 MB
- sloleks_3.1_039.xml96 MB
- sloleks_3.1_077.xml155 MB
- sloleks_3.1_012.xml152 MB