Files in this item

 Download all files in item (4.14 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
gos_ngrams_word_1-5.zip
Size
946.5 KB
Format
application/zip
Description
1- to 5-grams of words (pronunciation-based spelling) excluding punctuation. The minimum frequency threshold is 2.
MD5
ac638e81a8a7bae5b0bc4dae484d0389
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_word_c-no_n-1_t-1.txt366 kB
    • sorted_cut-2_gos_word_c-no_n-5_t-1.txt104 kB
    • sorted_cut-2_gos_word_c-no_n-4_t-1.txt278 kB
    • sorted_cut-2_gos_word_c-no_n-3_t-1.txt800 kB
    • sorted_cut-2_gos_word_c-no_n-2_t-1.txt1 MB
Icon
Name
gos_ngrams_norm_1-5.zip
Size
981.5 KB
Format
application/zip
Description
1- to 5-grams of normalized words (standardized spelling) excluding punctuation. The minimum frequency threshold is 2.
MD5
586b75baa4a7ceb86825c79a900cf073
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_lc_c-no_n-3_t-1.txt898 kB
    • sorted_cut-2_gos_lc_c-no_n-1_t-1.txt331 kB
    • sorted_cut-2_gos_lc_c-no_n-5_t-1.txt123 kB
    • sorted_cut-2_gos_lc_c-no_n-4_t-1.txt348 kB
    • sorted_cut-2_gos_lc_c-no_n-2_t-1.txt1 MB
Icon
Name
gos_ngrams_word-norm-lemma-tag_1-5.zip
Size
1.86 MB
Format
application/zip
Description
1- to 5-grams of words with normalized form, lemma and morphosyntactic tag including punctuation. The minimum frequency threshold is 2.
MD5
98e7e7f91a0ad35f367ded64bbd35f43
 Download file  Preview
 File Preview  
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-5_t-1.txt368 kB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-4_t-1.txt974 kB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-3_t-1.txt2 MB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-1_t-1.txt1 MB
    • sorted_cut-2_gos_word-lc-lemma-tag_c-yes_n-2_t-1.txt3 MB
Icon
Name
kres_AFL_norm_1-5_min5M.zip
Size
411.66 KB
Format
application/zip
Description
Adjusted frequency list for 1- to 5-grams of normalized words (standardized spelling) excluding punctuation. The minimum relative frequency threshold for substring reduction is 5. Column 1: n-gram; column 2: length of n-gram, column 3: adjusted corpus frequency.
MD5
4e38a0184cc591847a20d34c397b41b4
 Download file  Preview
 File Preview  
    • sorted_cut-1_AFL_gos_lc_c-no_n-5_t-5.txt1 MB