Show simple item record

 
dc.contributor.author Krek, Simon
dc.contributor.author Gantar, Apolonija
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Krsnik, Luka
dc.contributor.author Kosem, Iztok
dc.contributor.author Brank, Janez
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Čibej, Jaka
dc.contributor.author Robnik-Šikonja, Marko
dc.contributor.author Klemenc, Bojan
dc.contributor.author Gorjanc, Vojko
dc.date.accessioned 2021-03-26T09:24:28Z
dc.date.available 2021-03-26T09:24:28Z
dc.date.issued 2021-03-25
dc.identifier.uri http://hdl.handle.net/11356/1421
dc.description The MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples. MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense. MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject multiword expressions
dc.subject lexicon
dc.subject syntactic structures
dc.subject computational lexicography
dc.title Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType computationalLexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Simon Krek simon.krek@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor University of Ljubljana P6-0215 Slovene Language - Basic, Contrastive, and Applied Studies nationalFunds
size.info 5242 entries
files.count 1
files.size 1571066


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
MWE-lexicon-Gigafida2.1.zip
Size
1.5 MB
Format
application/zip
Description
Multiword Expressions lexicon in XML
MD5
95d122326e2a9f87a8a7b38ef1dd8ed7
 Download file  Preview
 File Preview  
  • MWE-lexicon-Gigafida2.1
    • MWE-lexicon-Gigafida2.1.xml10 MB
    • JOS_structures_2021-03-09.xsd4 kB
    • monolingual_dictionaries.xsd2 kB
    • JOS_structures_2021-03-09.xml2 MB
    • inventory.xsd27 kB

Show simple item record