Show simple item record

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Krek, Simon
dc.contributor.author Gantar, Polona
dc.contributor.author Pori, Eva
dc.contributor.author Čibej, Jaka
dc.contributor.author Klemenc, Bojan
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Gorjanc, Vojko
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Zgaga, Karolina
dc.contributor.author Roblek, Rebeka
dc.date.accessioned 2024-03-27T16:48:01Z
dc.date.available 2024-03-27T16:48:01Z
dc.date.issued 2023-12-31
dc.identifier.uri http://hdl.handle.net/11356/1933
dc.description The database of the Collocations Dictionary of Modern Slovene 2.0 contains 4,491,958 collocations in 81,443 entries. Collocations occur in 81 different syntactic relations. Collocations are labelled according to their status as "automatic" (automatically extracted, not yet manually validated) and "manual" (manually validated). In total, there are 2,090 completed entries (all collocations manually validated) and 11,227 entries with sense division and a combination of manual and automatic collocations. The IDs, provided for headwords, senses and collocations, come from the Digital Dictionary Database for Slovene. Collocations were obtained from the Gigafida 2.0 corpus, using a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. The link to the paper describing the procedure (Krek et al. 2022) is listed as a reference in this entry. The dictionary is split into 41 files of 2000 entries to keep the file size manageable.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://elex.link/elex2023/wp-content/uploads/100.pdf
dc.relation.isreferencedby http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202022/EURALEX2022_Pr_p240-252_Krek-Gantar-Kosem.pdf
dc.relation.replaces http://hdl.handle.net/11356/1250
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/kssj/
dc.subject collocations
dc.subject dictionary
dc.subject syntactic structures
dc.title Collocations Dictionary of Modern Slovene KSSS 2.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://viri.cjvt.si/kolokacije/slv/
contact.person Iztok Kosem iztok.kosem@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor University of Ljubljana P6-0215 Slovene Language - Basic, Contrastive, and Applied Studies nationalFunds
sponsor Republic of Slovenia, Ministry of Culture 3340-21-722002 Upgrading fundamental dictionary resources and databases of CJVT UL nationalFunds
sponsor Republic of Slovenia, Ministry of Culture JR-NPJP-22-23 Upgrading language portals at CJVT nationalFunds
size.info 81443 entries
size.info 4491958 collocations
files.count 1
files.size 104947463


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
CJVT-Collocations-Dictionary-of-Modern-Slovene-v2.zip
Size
100.09 MB
Format
application/zip
Description
Collocations Dictionary of Modern Slovene 2.0 Database and Schema
MD5
5c09e277a0d4b9c85b203120ae26e21f
 Download file  Preview
 File Preview  

Show simple item record