Show simple item record

 
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.contributor.author Holozan, Peter
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Romih, Miro
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Čibej, Jaka
dc.contributor.author Krsnik, Luka
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2019-03-26T22:58:58Z
dc.date.available 2019-03-26T22:58:58Z
dc.date.issued 2019-03-26
dc.identifier.uri http://hdl.handle.net/11356/1230
dc.description Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, their inflected or derivative word forms and the corresponding grammatical description. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard. Sloleks 2.0 includes accents automatically assigned by the use of neural networks (Krsnik 2017) and partially manually corrected, as well as automatically generated IPA and SAMPA transcriptions on lemmas and word-forms. The canonical version is encoded in XML, against the Sloleks LMF DTD. The resource is also available as a TSV file in the MULTEXT-East format, with wordform, lemma, MSD and frequency columns, also mapped to Universal Dependencies features. References: Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec, 2017: The Sloleks Morphological Lexicon and its Future Development. In (Vojko Gorjanc, Polona Gantar, Iztok Kosem and Simon Krek, eds.): Dictionary of Modern Slovene: Problems and Solutions. Ljubljana University Press, Faculty of Arts. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/download/2/1/47-1 Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/download/2/1/47-1?inline=1
dc.relation.isreferencedby http://eprints.fri.uni-lj.si/3978/
dc.relation.replaces http://hdl.handle.net/11356/1039
dc.relation.isreplacedby http://hdl.handle.net/11356/1745
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri http://eng.slovenscina.eu/sloleks/opis
dc.subject morphology
dc.subject inflection
dc.subject word forms
dc.subject derivation
dc.subject LMF
dc.subject lemmatisation
dc.subject word accents
dc.subject IPA
dc.subject SAMPA
dc.title Morphological lexicon Sloleks 2.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri http://eng.slovenscina.eu/sloleks
contact.person Simon Krek simon.krek@guest.arnes.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 100805 entries
files.count 2
files.size 89964065


 Files in this item

 Download all files in item (85.8 MB)
Icon
Name
Sloleks2.0.LMF.zip
Size
45.04 MB
Format
application/zip
Description
Sloleks in Sloleks LMF XML encoding
MD5
b2e097234f8bf0d92b6a0feab97e6dbc
 Download file  Preview
 File Preview  
  • Sloleks2.0.LMF
    • SLOLEKS_LMF.dtd2 kB
    • sloleks_clarin_2.0.xml1 GB
    • 00README.txt329 B
Icon
Name
Sloleks2.0.MTE.zip
Size
40.76 MB
Format
application/zip
Description
Sloleks in tabular MULTEXT format, MSDs tags in Slovenian and English, with added Universal Dependencies morphosyntactic features.
MD5
a35b9e2850ca1d283b27c8faf428c7a7
 Download file  Preview
 File Preview  
  • Sloleks2.0.MTE
    • sloleks_clarin_2.0-sl.tbl99 MB
    • sloleks_clarin_2.0-en.ud.tbl435 MB
    • 00README.txt618 B

Show simple item record