Show simple item record Krsnik, Luka Laskowski, Cyprian Krek, Simon 2023-09-08T14:51:40Z 2023-09-08T14:51:40Z 2023-09-08
dc.description The inflectional data lookup module serves as an optional component within the cordex library ( that significantly improves the quality of the results. The module consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of and its frequency within the Gigafida 2.0 corpus (, or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni").
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.label PUB
dc.subject inflectional data
dc.title CORDEX inflectional lookup data 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Luka Krsnik Luka Krsnik
contact.person Simon Krek Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
files.count 1
files.size 32964852

 Files in this item

31.44 MB
Compressed pickled dictionary for cordex
 Download file

Show simple item record