Show simple item record

 
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Roblek, Rebeka
dc.contributor.author Vianello, Chiara
dc.contributor.author Diaci, Ajda
dc.contributor.author Vuga, Zala
dc.date.accessioned 2020-01-06T08:43:29Z
dc.date.available 2020-01-06T08:43:29Z
dc.date.issued 2020-01-06
dc.identifier.uri http://hdl.handle.net/11356/1279
dc.description This document contains 2,374 formulaic sequences in spoken Slovenian, i.e. frequently recurring strings of two to five words, manually annotated for syntactic structure, pragmatic function, and dictionary relevance. The list of sequences with a minimum frequency threshold of 20/million is based on the Frequency lists of word-level n-grams from normalized word forms in GOS 1.0 (http://hdl.handle.net/11356/1271) and contains the union of top-1,000 formulaic sequences ranked by frequency and five association measures (Dice, t-test, MI, MI3, simple-LL). Note that there exists a related entry, "List of formulaic sequences in standard written Slovenian", http://hdl.handle.net/11356/1280.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby http://slovnica.ijs.si/wp-content/uploads/2019/12/NSSS_DS5-nizi_navodila_v6.pdf
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject n-grams
dc.subject manual annotation
dc.subject formulaic language
dc.subject spoken language
dc.subject multiword expressions
dc.title List of formulaic sequences in spoken Slovenian
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Kaja Dobrovoljc kaja.dobrovoljc@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
size.info 2374 expressions
files.count 1
files.size 371169


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
formulaic-sequences_GOS_top1000.tsv
Size
362.47 KB
Format
Unknown
Description
List of manually annotated formulaic sequences in GOS 1.0.
MD5
d1f8990e2fd3cf543595cefabeaedaf5
 Download file

Show simple item record