Show simple item record

 
dc.contributor.author Krsnik, Luka
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Krek, Simon
dc.date.accessioned 2023-09-08T14:51:40Z
dc.date.available 2023-09-08T14:51:40Z
dc.date.issued 2023-09-08
dc.identifier.uri http://hdl.handle.net/11356/1854
dc.description The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of https://gitea.cjvt.si/generic/conversion_utils and its frequency within the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320), or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni").
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/cordex/
dc.subject inflectional data
dc.title CORDEX inflectional lookup data 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Luka Krsnik krsnik.luka92@gmail.com Luka Krsnik
contact.person Simon Krek simon.krek@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
files.count 1
files.size 32964852


 Files in this item

Icon
Name
sl.xz
Size
31.44 MB
Format
Unknown
Description
Compressed pickled dictionary for cordex
MD5
11949252892e18ea7bf216e27fb29bae
 Download file

Show simple item record