Show simple item record

 
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Čibej, Jaka
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Krek, Simon
dc.date.accessioned 2021-03-18T10:16:16Z
dc.date.available 2021-03-18T10:16:16Z
dc.date.issued 2020-12-12
dc.identifier.uri http://hdl.handle.net/11356/1411
dc.description This entry consists of XML files with 96,290 lexical units (nouns, verbs, adjectives, and adverbs) from the Sloleks Morphological Lexicon of Slovene 2.0 (http://hdl.handle.net/11356/1230) that include codes for morphological patterns. The pattern codes were designed based on a manual analysis of automatically extracted paradigms and were obtained as follows: The lexical units from Sloleks 2.0 were first automatically clustered into groups through a rule-based approach based on (1) a number of predetermined grammatical features from the MULTEXT-East Version 6 morphosyntactic specifications for Slovenian (http://nl.ijs.si/ME/V6/), such as part of speech, gender and properness for nouns, aspect for verbs, and (2) the differentiating characteristics of their morphological paradigms (i.e. their mutable word parts, which are similar to but not always overlapping with the linguistic definition of word endings – for example: čas-Ø; čas-a; čas-om / prijatelj- Ø; prijatelj-a; prijatelj-em / odstot-ek; odstot-ka; odstot-kom). More than 1,000 automatically extracted pattern candidates were subsequently linguistically analyzed, combined into groups, and hierarchically organized. As a result, every lexical unit in the XML file features a code (listed as <grammarFeature name="lexeme_pattern">) corresponding to the relevant morphological paradigm in the hierarchy (available in the accompanying file titled "nssss_morphological_pattern_hierarchy_1.0.tsv"). Because the patterns were extracted from Sloleks 2.0, they reflect the decisions that were implemented in its initial compilation, particularly in terms of the degree of morphological variation documented in the lexicon (e.g. not all morphological variants are necessarily included in the lexicon) and paradigm integrity (for instance, some nouns in Sloleks 2.0 only feature singular or plural forms). It should be noted that non-standard word forms were not included in the design of the patterns. In addition, the XML file does not contain lexical units from Sloleks 2.0 that consist of word forms from more than one morphological paradigm (e.g. lesketati – lesketam / leskečem; or lojen – lojenega / lojnega), or other problematic units (such as those with missing or erroneous data).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject lexicon
dc.subject morphology
dc.subject morphological patterns
dc.title Morphological patterns from the Sloleks 2.0 lexicon 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Špela Arhar Holdt arhar.spela@gmail.com Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
size.info 96290 units
files.count 1
files.size 22064430


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
nssss_morphological_patterns_from_sloleks_2.0_v1.0.zip
Size
21.04 MB
Format
application/zip
Description
Complete dataset
MD5
049acacdd1b9478df5ca1ac478d6663f
 Download file  Preview
 File Preview  
  • nssss_morphological_patterns_from_sloleks_2.0_v1.0
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_p.xml157 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_h.xml17 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_micro.xml4 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_u.xml20 MB
    • nssss_morphological_patterns_hierarchy_1.0.tsv238 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_d-diacritic.xml95 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_m.xml47 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_z.xml48 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_e.xml16 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_r.xml46 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_o-diacritic.xml4 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_j.xml13 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_w.xml1 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_z-diacritic.xml8 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_b.xml39 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_o.xml60 MB
    • xml_schema
      • morphological_lexicon.xsd1 kB
      • inventory.xsd28 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_g.xml27 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_t.xml35 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_c-diacritic.xml10 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_l.xml23 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_y.xml148 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_d.xml46 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_q.xml50 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_i.xml33 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_v.xml40 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_a.xml27 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_n.xml70 MB
    • 00README.txt6 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_f.xml15 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_s.xml82 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_k.xml62 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_x.xml16 kB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_s-diacritic.xml17 MB
    • nssss_morphological_patterns_from_sloleks_2.0_v1.0_c.xml11 MB

Show simple item record