dc.contributor.author | Krek, Simon |
dc.contributor.author | Gantar, Polona |
dc.contributor.author | Kosem, Iztok |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Laskowski, Cyprian |
dc.contributor.author | Klemenc, Bojan |
dc.contributor.author | Krsnik, Luka |
dc.date.accessioned | 2021-03-16T08:36:14Z |
dc.date.available | 2021-03-16T08:36:14Z |
dc.date.issued | 2021-03-09 |
dc.identifier.uri | http://hdl.handle.net/11356/1415 |
dc.description | Frequency lists of collocations were extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialised scripts for extraction of data from syntactically parsed corpora. The lists contain collocations with absolute frequency 10 and above, split into files corresponding to 81 predefined syntactic structures. The formal description of syntactic structures with information on restrictions and representations applied to POS and dependency parsing annotations is included in the dataset. The lists are sorted according to absolute frequency of collocations and include frequency information on individual lemmas, together with the most frequent representative forms of combined lemmas. The lists also include calculation of logDice score for collocations, and the number of distinct forms of lemmas appearing in corpus hits for a particular collocation. |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | collocations |
dc.subject | syntactic structures |
dc.title | Frequency lists of collocations from the Gigafida 2.1 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Simon Krek simon.krek@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | University of Ljubljana P6-0215 Slovene Language - Basic, Contrastive, and Applied Studies nationalFunds |
sponsor | ARRS (Slovenian Research Agency) J6-8255 Collocations as a basis for language description: semantic and temporal perspectives nationalFunds |
size.info | 82 files |
size.info | 4002918 collocations |
files.count | 1 |
files.size | 146338935 |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- GF2.1-Collocations_JOS-structures.zip
- Velikost
- 139.56 MB
- Format
- application/zip
- Opis
- Collocations lists from Gigafida 2.1 in CSV format
- MD5
- 8dd20dcfc7dc048ead42d45643bb0342
- GF2.1-Collocations_JOS-structures
- structure_41.csv720 kB
- structure_19.csv173 kB
- structure_57.csv16 MB
- structure_95.csv90 kB
- structure_30.csv5 MB
- structure_103.csv22 kB
- JOS_structures_2021-03-09.xsd4 kB
- structure_46.csv8 MB
- structure_84.csv154 kB
- structure_35.csv292 kB
- structure_73.csv754 kB
- structure_108.csv1 MB
- structure_89.csv11 MB
- structure_24.csv334 kB
- structure_13.csv11 MB
- structure_78.csv324 kB
- structure_51.csv15 MB
- structure_29.csv2 MB
- structure_40.csv914 kB
- structure_18.csv1 MB
- structure_94.csv1 kB
- structure_102.csv243 kB
- structure_45.csv315 kB
- structure_83.csv445 kB
- structure_99.csv104 kB
- structure_34.csv107 MB
- structure_72.csv3 MB
- structure_107.csv700 kB
- structure_88.csv6 MB
- structure_23.csv40 MB
- structure_39.csv32 kB
- structure_12.csv2 MB
- structure_77.csv4 MB
- structure_50.csv10 MB
- structure_28.csv1 MB
- structure_17.csv1 MB
- structure_55.csv2 MB
- structure_93.csv1 MB
- structure_101.csv69 kB
- structure_44.csv452 kB
- structure_82.csv1 MB
- structure_98.csv565 kB
- structure_71.csv16 MB
- structure_106.csv29 MB
- structure_49.csv1 MB
- structure_87.csv429 kB
- structure_22.csv5 MB
- structure_38.csv670 kB
- structure_76.csv1 MB
- structure_27.csv2 MB
- structure_16.csv14 MB
- structure_54.csv1 MB
- structure_92.csv422 kB
- structure_100.csv408 kB
- structure_43.csv25 MB
- structure_81.csv10 MB
- structure_32.csv224 kB
- structure_70.csv55 MB
- structure_105.csv31 kB
- structure_48.csv7 MB
- structure_86.csv2 MB
- structure_37.csv3 kB
- structure_75.csv205 kB
- structure_26.csv1 MB
- structure_15.csv40 MB
- structure_53.csv74 MB
- structure_91.csv117 kB
- structure_69.csv2 MB
- structure_42.csv554 kB
- structure_80.csv54 kB
- structure_96.csv631 kB
- structure_31.csv84 kB
- structure_104.csv128 kB
- structure_47.csv3 MB
- structure_85.csv2 MB
- structure_36.csv793 kB
- structure_74.csv3 MB
- structure_25.csv1 MB
- structure_14.csv20 MB
- structure_52.csv28 MB
- structure_90.csv5 MB
- JOS_structures_2021-03-09.xml2 MB
- structure_68.csv1 MB