| dc.contributor.author | Kosem, Iztok |
| dc.contributor.author | Arhar Holdt, Špela |
| dc.contributor.author | Krek, Simon |
| dc.contributor.author | Gantar, Polona |
| dc.contributor.author | Pori, Eva |
| dc.contributor.author | Čibej, Jaka |
| dc.contributor.author | Klemenc, Bojan |
| dc.contributor.author | Laskowski, Cyprian |
| dc.contributor.author | Dobrovoljc, Kaja |
| dc.contributor.author | Gorjanc, Vojko |
| dc.contributor.author | Ljubešić, Nikola |
| dc.contributor.author | Zgaga, Karolina |
| dc.contributor.author | Roblek, Rebeka |
| dc.contributor.author | Zaranšek, Petra |
| dc.contributor.author | Kamenšek, Urška |
| dc.contributor.author | Šešet, Jure |
| dc.contributor.author | Ponikvar, Primož |
| dc.date.accessioned | 2026-02-19T09:19:03Z |
| dc.date.available | 2026-02-19T09:19:03Z |
| dc.date.issued | 2026-02-17 |
| dc.identifier.uri | http://hdl.handle.net/11356/2090 |
| dc.description | The database of the Collocations Dictionary of Modern Slovene 2.2 contains 4,425,942 collocations in 78,046 entries. Collocations occur in 81 different syntactic relations. Collocations are labelled according to their status as "automatic" (automatically extracted, not yet manually validated) and "manual" (manually validated). In total, there are 4,377 completed entries (all collocations manually validated) and 2,549 entries with sense division and a combination of manual and automatic collocations. The IDs, provided for headwords, senses and collocations, come from the Digital Dictionary Database for Slovene. The main difference from the previous versions are over 268,000 manually validated collocations. Also due to manual validation, over 3,000 headwords were removed from the dictionary. Collocations were obtained from the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320), using a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. The link to the paper describing the procedure (Krek et al. EURALEX 2022) is listed as a reference in this entry. The dictionary is split into 41 files of 2000 entries to keep the file size manageable. |
| dc.language.iso | slv |
| dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
| dc.relation.isreferencedby | https://elex.link/elex2023/wp-content/uploads/100.pdf |
| dc.relation.isreferencedby | http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202022/EURALEX2022_Pr_p240-252_Krek-Gantar-Kosem.pdf |
| dc.relation.replaces | http://hdl.handle.net/11356/1933 |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://www.cjvt.si/kssj/ |
| dc.subject | collocations |
| dc.subject | dictionary |
| dc.subject | syntactic structures |
| dc.title | Collocations Dictionary of Modern Slovene KSSS 2.2 |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| demo.uri | https://viri.cjvt.si/kolokacije/slv/ |
| contact.person | Iztok Kosem iztok.kosem@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
| sponsor | University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds |
| sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
| sponsor | Ministry of Culture of the Republic of Slovenia JR-infrastruktura-SJ-2024-2025 Data completion and gamification of dictionary resources at CJVT UL (PODVIG) nationalFunds |
| size.info | 78046 entries |
| size.info | 4425942 collocations |
| files.count | 1 |
| files.size | 131533500 |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Ime
- CJVT-Collocations-Dictionary-of-Modern-Slovene-v2.2.zip
- Velikost
- 125.44 MB
- Format
- application/zip
- Opis
- database + schema
- MD5
- ec0bcdc275bae380822b2d363e7e64e1
- CJVT-Collocations-Dictionary-of-Modern-Slovene-v2.2
- collocations_export_56000_58000.xml-1 B
- collocations_export_10000_12000.xml-1 B
- collocations_export_4000_6000.xml-1 B
- collocations_export_82000_82563.xml-1 B
- collocations_export_20000_22000.xml-1 B
- collocations_export_66000_68000.xml-1 B
- collocations_export_0_2000.xml-1 B
- collocations_export_76000_78000.xml-1 B
- collocations_export_30000_32000.xml-1 B
- collocations_export_18000_20000.xml-1 B
- collocations_export_40000_42000.xml-1 B
- collocations_export_50000_52000.xml-1 B
- collocations_export_28000_30000.xml-1 B
- collocations_export_38000_40000.xml-1 B
- collocations_export_60000_62000.xml-1 B
- collocations_export_48000_50000.xml-1 B
- collocations_export_14000_16000.xml-1 B
- collocations_export_70000_72000.xml-1 B
- collocations_export_58000_60000.xml-1 B
- collocations_export_80000_82000.xml-1 B
- collocations_export_24000_26000.xml-1 B
- collocations_export_68000_70000.xml-1 B
- collocations_export_34000_36000.xml-1 B
- collocations_export_78000_80000.xml-1 B
- collocations_export_44000_46000.xml-1 B
- collocations_export_54000_56000.xml-1 B
- collocations_export_64000_66000.xml-1 B
- collocations_export_74000_76000.xml-1 B
- collocations_export_2000_4000.xml-1 B
- collocations_export_6000_8000.xml-1 B
- collocations_export_8000_10000.xml-1 B
- collocations_export_12000_14000.xml-1 B
- collocations_export_22000_24000.xml-1 B
- collocations_export_32000_34000.xml-1 B
- collocations_export_42000_44000.xml-1 B
- collocations_export_52000_54000.xml-1 B
- collocations_export_62000_64000.xml-1 B
- collocations_export_72000_74000.xml-1 B
- collocations_export_16000_18000.xml-1 B
- collocations_export_26000_28000.xml-1 B
- collocations_export_36000_38000.xml-1 B
- collocations_export_46000_48000.xml-1 B
- monolingual_dictionaries.xsd-1 B
- inventory.xsd-1 B