dc.contributor.author | Krek, Simon |
dc.contributor.author | Gantar, Apolonija |
dc.contributor.author | Laskowski, Cyprian |
dc.contributor.author | Krsnik, Luka |
dc.contributor.author | Kosem, Iztok |
dc.contributor.author | Brank, Janez |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.contributor.author | Klemenc, Bojan |
dc.contributor.author | Gorjanc, Vojko |
dc.date.accessioned | 2021-03-26T09:24:28Z |
dc.date.available | 2021-03-26T09:24:28Z |
dc.date.issued | 2021-03-25 |
dc.identifier.uri | http://hdl.handle.net/11356/1421 |
dc.description | The MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples. MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense. MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense. |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | multiword expressions |
dc.subject | lexicon |
dc.subject | syntactic structures |
dc.subject | computational lexicography |
dc.title | Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | computationalLexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Simon Krek simon.krek@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | University of Ljubljana P6-0215 Slovene Language - Basic, Contrastive, and Applied Studies nationalFunds |
size.info | 5242 entries |
files.count | 1 |
files.size | 1571066 |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- MWE-lexicon-Gigafida2.1.zip
- Velikost
- 1.5 MB
- Format
- application/zip
- Opis
- Multiword Expressions lexicon in XML
- MD5
- 95d122326e2a9f87a8a7b38ef1dd8ed7
- MWE-lexicon-Gigafida2.1
- MWE-lexicon-Gigafida2.1.xml10 MB
- JOS_structures_2021-03-09.xsd4 kB
- monolingual_dictionaries.xsd2 kB
- JOS_structures_2021-03-09.xml2 MB
- inventory.xsd27 kB