Show simple item record

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Krek, Simon
dc.contributor.author Gantar, Polona
dc.contributor.author Pori, Eva
dc.contributor.author Čibej, Jaka
dc.contributor.author Klemenc, Bojan
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Gorjanc, Vojko
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Zgaga, Karolina
dc.contributor.author Roblek, Rebeka
dc.contributor.author Zaranšek, Petra
dc.contributor.author Kamenšek, Urška
dc.contributor.author Šešet, Jure
dc.contributor.author Ponikvar, Primož
dc.date.accessioned 2026-02-19T09:19:03Z
dc.date.available 2026-02-19T09:19:03Z
dc.date.issued 2026-02-17
dc.identifier.uri http://hdl.handle.net/11356/2090
dc.description The database of the Collocations Dictionary of Modern Slovene 2.2 contains 4,425,942 collocations in 78,046 entries. Collocations occur in 81 different syntactic relations. Collocations are labelled according to their status as "automatic" (automatically extracted, not yet manually validated) and "manual" (manually validated). In total, there are 4,377 completed entries (all collocations manually validated) and 2,549 entries with sense division and a combination of manual and automatic collocations. The IDs, provided for headwords, senses and collocations, come from the Digital Dictionary Database for Slovene. The main difference from the previous versions are over 268,000 manually validated collocations. Also due to manual validation, over 3,000 headwords were removed from the dictionary. Collocations were obtained from the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320), using a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. The link to the paper describing the procedure (Krek et al. EURALEX 2022) is listed as a reference in this entry. The dictionary is split into 41 files of 2000 entries to keep the file size manageable.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://elex.link/elex2023/wp-content/uploads/100.pdf
dc.relation.isreferencedby http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202022/EURALEX2022_Pr_p240-252_Krek-Gantar-Kosem.pdf
dc.relation.replaces http://hdl.handle.net/11356/1933
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/kssj/
dc.subject collocations
dc.subject dictionary
dc.subject syntactic structures
dc.title Collocations Dictionary of Modern Slovene KSSS 2.2
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://viri.cjvt.si/kolokacije/slv/
contact.person Iztok Kosem iztok.kosem@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Ministry of Culture of the Republic of Slovenia JR-infrastruktura-SJ-2024-2025 Data completion and gamification of dictionary resources at CJVT UL (PODVIG) nationalFunds
size.info 78046 entries
size.info 4425942 collocations
files.count 1
files.size 131533500


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
CJVT-Collocations-Dictionary-of-Modern-Slovene-v2.2.zip
Size
125.44 MB
Format
application/zip
Description
database + schema
MD5
ec0bcdc275bae380822b2d363e7e64e1
 Download file  Preview
 File Preview  
  • CJVT-Collocations-Dictionary-of-Modern-Slovene-v2.2
    • collocations_export_56000_58000.xml-1 B
    • collocations_export_10000_12000.xml-1 B
    • collocations_export_4000_6000.xml-1 B
    • collocations_export_82000_82563.xml-1 B
    • collocations_export_20000_22000.xml-1 B
    • collocations_export_66000_68000.xml-1 B
    • collocations_export_0_2000.xml-1 B
    • collocations_export_76000_78000.xml-1 B
    • collocations_export_30000_32000.xml-1 B
    • collocations_export_18000_20000.xml-1 B
    • collocations_export_40000_42000.xml-1 B
    • collocations_export_50000_52000.xml-1 B
    • collocations_export_28000_30000.xml-1 B
    • collocations_export_38000_40000.xml-1 B
    • collocations_export_60000_62000.xml-1 B
    • collocations_export_48000_50000.xml-1 B
    • collocations_export_14000_16000.xml-1 B
    • collocations_export_70000_72000.xml-1 B
    • collocations_export_58000_60000.xml-1 B
    • collocations_export_80000_82000.xml-1 B
    • collocations_export_24000_26000.xml-1 B
    • collocations_export_68000_70000.xml-1 B
    • collocations_export_34000_36000.xml-1 B
    • collocations_export_78000_80000.xml-1 B
    • collocations_export_44000_46000.xml-1 B
    • collocations_export_54000_56000.xml-1 B
    • collocations_export_64000_66000.xml-1 B
    • collocations_export_74000_76000.xml-1 B
    • collocations_export_2000_4000.xml-1 B
    • collocations_export_6000_8000.xml-1 B
    • collocations_export_8000_10000.xml-1 B
    • collocations_export_12000_14000.xml-1 B
    • collocations_export_22000_24000.xml-1 B
    • collocations_export_32000_34000.xml-1 B
    • collocations_export_42000_44000.xml-1 B
    • collocations_export_52000_54000.xml-1 B
    • collocations_export_62000_64000.xml-1 B
    • collocations_export_72000_74000.xml-1 B
    • collocations_export_16000_18000.xml-1 B
    • collocations_export_26000_28000.xml-1 B
    • collocations_export_36000_38000.xml-1 B
    • collocations_export_46000_48000.xml-1 B
    • monolingual_dictionaries.xsd-1 B
    • inventory.xsd-1 B

Show simple item record