Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2019-11-13T08:50:19Z
dc.date.available 2019-11-13T08:50:19Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/11356/1269
dc.description Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their normalized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their normalized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.isreplacedby http://hdl.handle.net/11356/1364
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject frequency list
dc.subject spoken corpus
dc.subject words
dc.subject lemmas
dc.subject normalized forms
dc.title Frequency lists of words from the GOS 1.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
files.count 1
files.size 4717179


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GOS1.0-words.zip
Size
4.5 MB
Format
application/zip
Description
Frequency lists of words in GOS1.0
MD5
eac2a3ff4a60fc7d26625591db22bfee
 Download file  Preview
 File Preview  
  • GOS1.0-words-verbs
    • GOS1.0-words-verbs-lemmas-taxonomy-entire.tsv2 MB
    • GOS1.0-words-verbs-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
  • GOS1.0-words-prepositions
    • GOS1.0-words-prepositions-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv62 kB
    • GOS1.0-words-prepositions-lemmas-taxonomy-entire.tsv16 kB
  • GOS1.0-words-interjections
    • GOS1.0-words-interjections-lemmas-taxonomy-entire.tsv12 kB
    • GOS1.0-words-interjections-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv20 kB
  • GOS1.0-words-particles
    • GOS1.0-words-particles-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv63 kB
    • GOS1.0-words-particles-lemmas-taxonomy-entire.tsv16 kB
  • GOS1.0-words-all
    • GOS1.0-words-all-morphosyntactic_tags-split_MSD-taxonomy-entire.tsv172 kB
    • GOS1.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-entire.tsv9 MB
    • GOS1.0-words-all-lemmas-parts_of_speech-taxonomy-entire.tsv3 MB
    • GOS1.0-words-all-lowercase_forms-normalized_forms-lemmas-parts_of_speech-taxonomy-entire.tsv10 MB
  • GOS1.0-words-adjectives
    • GOS1.0-words-adjectives-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
    • GOS1.0-words-adjectives-lemmas-taxonomy-entire.tsv1 MB
  • GOS1.0-words-numerals
    • GOS1.0-words-numerals-lemmas-taxonomy-entire.tsv92 kB
    • GOS1.0-words-numerals-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv371 kB
  • GOS1.0-words-nouns
    • GOS1.0-words-nouns-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv5 MB
    • GOS1.0-words-nouns-lemmas-taxonomy-entire.tsv3 MB
  • GOS1.0-words-pronouns
    • GOS1.0-words-pronouns-lemmas-taxonomy-entire.tsv84 kB
    • GOS1.0-words-pronouns-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv587 kB
  • GOS1.0-words-residual
    • GOS1.0-words-residual-lemmas-taxonomy-entire.tsv1 MB
    • GOS1.0-words-residual-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv1 MB
  • GOS1.0-words-adverbs
    • GOS1.0-words-adverbs-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv581 kB
    • GOS1.0-words-adverbs-lemmas-taxonomy-entire.tsv280 kB
  • GOS1.0-words-abbreviations
    • GOS1.0-words-abbreviations-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv4 kB
    • GOS1.0-words-abbreviations-lemmas-taxonomy-entire.tsv4 kB
  • GOS1.0-words-conjunctions
    • GOS1.0-words-conjunctions-lemmas-taxonomy-entire.tsv12 kB
    • GOS1.0-words-conjunctions-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv73 kB

Show simple item record