dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Roblek, Rebeka |
dc.contributor.author | Vianello, Chiara |
dc.contributor.author | Diaci, Ajda |
dc.contributor.author | Vuga, Zala |
dc.date.accessioned | 2020-01-06T08:43:29Z |
dc.date.available | 2020-01-06T08:43:29Z |
dc.date.issued | 2020-01-06 |
dc.identifier.uri | http://hdl.handle.net/11356/1279 |
dc.description | This document contains 2,374 formulaic sequences in spoken Slovenian, i.e. frequently recurring strings of two to five words, manually annotated for syntactic structure, pragmatic function, and dictionary relevance. The list of sequences with a minimum frequency threshold of 20/million is based on the Frequency lists of word-level n-grams from normalized word forms in GOS 1.0 (http://hdl.handle.net/11356/1271) and contains the union of top-1,000 formulaic sequences ranked by frequency and five association measures (Dice, t-test, MI, MI3, simple-LL). Note that there exists a related entry, "List of formulaic sequences in standard written Slovenian", http://hdl.handle.net/11356/1280. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.relation.isreferencedby | http://slovnica.ijs.si/wp-content/uploads/2019/12/NSSS_DS5-nizi_navodila_v6.pdf |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | n-grams |
dc.subject | manual annotation |
dc.subject | formulaic language |
dc.subject | spoken language |
dc.subject | multiword expressions |
dc.title | List of formulaic sequences in spoken Slovenian |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Kaja Dobrovoljc kaja.dobrovoljc@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
size.info | 2374 expressions |
files.count | 1 |
files.size | 371169 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- formulaic-sequences_GOS_top1000.tsv
- Size
- 362.47 KB
- Format
- Unknown
- Description
- List of manually annotated formulaic sequences in GOS 1.0.
- MD5
- d1f8990e2fd3cf543595cefabeaedaf5