dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Holozan, Peter |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Romih, Miro |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Krsnik, Luka |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2019-03-26T22:58:58Z |
dc.date.available | 2019-03-26T22:58:58Z |
dc.date.issued | 2019-03-26 |
dc.identifier.uri | http://hdl.handle.net/11356/1230 |
dc.description | Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, their inflected or derivative word forms and the corresponding grammatical description. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard. Sloleks 2.0 includes accents automatically assigned by the use of neural networks (Krsnik 2017) and partially manually corrected, as well as automatically generated IPA and SAMPA transcriptions on lemmas and word-forms. The canonical version is encoded in XML, against the Sloleks LMF DTD. The resource is also available as a TSV file in the MULTEXT-East format, with wordform, lemma, MSD and frequency columns, also mapped to Universal Dependencies features. References: Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec, 2017: The Sloleks Morphological Lexicon and its Future Development. In (Vojko Gorjanc, Polona Gantar, Iztok Kosem and Simon Krek, eds.): Dictionary of Modern Slovene: Problems and Solutions. Ljubljana University Press, Faculty of Arts. https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/2/1/47 Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/ |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.relation.isreferencedby | https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/2/1/47 |
dc.relation.isreferencedby | http://eprints.fri.uni-lj.si/3978/ |
dc.relation.replaces | http://hdl.handle.net/11356/1039 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1745 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://eng.slovenscina.eu/opis.html |
dc.subject | morphology |
dc.subject | inflection |
dc.subject | word forms |
dc.subject | derivation |
dc.subject | LMF |
dc.subject | lemmatisation |
dc.subject | word accents |
dc.subject | IPA |
dc.subject | SAMPA |
dc.title | Morphological lexicon Sloleks 2.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | http://eng.slovenscina.eu/sloleks.html |
contact.person | Simon Krek simon.krek@guest.arnes.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
size.info | 100805 entries |
files.count | 2 |
files.size | 89964065 |
Files in this item
Download all files in item (85.8 MB)This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Name
- Sloleks2.0.LMF.zip
- Size
- 45.04 MB
- Format
- application/zip
- Description
- Sloleks in Sloleks LMF XML encoding
- MD5
- b2e097234f8bf0d92b6a0feab97e6dbc
- Sloleks2.0.LMF
- SLOLEKS_LMF.dtd2 kB
- sloleks_clarin_2.0.xml1 GB
- 00README.txt329 B

- Name
- Sloleks2.0.MTE.zip
- Size
- 40.76 MB
- Format
- application/zip
- Description
- Sloleks in tabular MULTEXT format, MSDs tags in Slovenian and English, with added Universal Dependencies morphosyntactic features.
- MD5
- a35b9e2850ca1d283b27c8faf428c7a7
- Sloleks2.0.MTE
- sloleks_clarin_2.0-sl.tbl99 MB
- sloleks_clarin_2.0-en.ud.tbl435 MB
- 00README.txt618 B