Show simple item record

 
dc.contributor.author Dobrovoljc, Kaja
dc.date.accessioned 2015-08-01T13:55:47Z
dc.date.available 2015-08-01T13:55:47Z
dc.date.issued 2015-07-01
dc.identifier.uri http://hdl.handle.net/11356/1046
dc.description This is a collection of n-grams extracted from the Gos corpus of spoken Slovene. http://hdl.handle.net/11356/1040. In addition to the separate lists of n-grams for tokens and their attributes (normalized form, morphosyntacic tag, lemma), an adjusted frequency list with statistical substring reduction has also been added (as described in O'Donnell 2011). Only n-grams within sentences have been counted.
dc.language.iso slv
dc.publisher Trojina, Institute for Applied Slovene Studies
dc.publisher Faculty of Arts, University of Ljubljana
dc.relation.isreplacedby http://hdl.handle.net/11356/1195
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://eng.slovenscina.eu/korpusi/gos
dc.subject n-grams
dc.subject wordlist
dc.subject multiword expressions
dc.title Gos corpus n-grams 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Kaja Dobrovoljc kaja.dobrovoljc@gmail.com Trojina, Institute for Applied Slovene Studies
sponsor ARRS (Slovenian Research Agency) MR-36491 Young Researcher Programme nationalFunds
size.info 677854 items
size.info 15.2 mb
files.count 4
files.size 4345197


 Files in this item

 Download all files in item (4.14 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
gos_ngrams_word_1-5.zip
Size
946.5 KB
Format
application/zip
Description
1- to 5-grams of words (pronunciation-based spelling) excluding punctuation. The minimum frequency threshold is 2.
MD5
ac638e81a8a7bae5b0bc4dae484d0389
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_word_c-no_n-1_t-1.txt366 kB
    • sorted_cut-2_gos_word_c-no_n-5_t-1.txt104 kB
    • sorted_cut-2_gos_word_c-no_n-4_t-1.txt278 kB
    • sorted_cut-2_gos_word_c-no_n-3_t-1.txt800 kB
    • sorted_cut-2_gos_word_c-no_n-2_t-1.txt1 MB
Icon
Name
gos_ngrams_norm_1-5.zip
Size
981.5 KB
Format
application/zip
Description
1- to 5-grams of normalized words (standardized spelling) excluding punctuation. The minimum frequency threshold is 2.
MD5
586b75baa4a7ceb86825c79a900cf073
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_lc_c-no_n-3_t-1.txt898 kB
    • sorted_cut-2_gos_lc_c-no_n-1_t-1.txt331 kB
    • sorted_cut-2_gos_lc_c-no_n-5_t-1.txt123 kB
    • sorted_cut-2_gos_lc_c-no_n-4_t-1.txt348 kB
    • sorted_cut-2_gos_lc_c-no_n-2_t-1.txt1 MB
Icon
Name
gos_ngrams_word-norm-lemma-tag_1-5.zip
Size
1.86 MB
Format
application/zip
Description
1- to 5-grams of words with normalized form, lemma and morphosyntactic tag including punctuation. The minimum frequency threshold is 2.
MD5
98e7e7f91a0ad35f367ded64bbd35f43
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-5_t-1.txt368 kB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-4_t-1.txt974 kB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-3_t-1.txt2 MB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-1_t-1.txt1 MB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-2_t-1.txt3 MB
Icon
Name
kres_AFL_norm_1-5_min5M.zip
Size
411.66 KB
Format
application/zip
Description
Adjusted frequency list for 1- to 5-grams of normalized words (standardized spelling) excluding punctuation. The minimum relative frequency threshold for substring reduction is 5. Column 1: n-gram; column 2: length of n-gram, column 3: adjusted corpus frequency.
MD5
4e38a0184cc591847a20d34c397b41b4
 Download file  Preview
 File Preview  
    • sorted_cut-1_AFL_gos_lc_c-no_n-5_t-5.txt1 MB

Show simple item record