Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Kosem, Iztok
dc.date.accessioned 2022-11-15T08:52:31Z
dc.date.available 2022-11-15T08:52:31Z
dc.date.issued 2022-10-28
dc.identifier.uri http://hdl.handle.net/11356/1702
dc.description The frequency list of words by source was prepared in the following manner: words (i.e. lemmas with their lexical features) were extracted from 15 most frequent sources in the Trendi Monitor Corpus of Slovene (http://hdl.handle.net/11356/1590) covering the period between 1 January 2019 and 31 July 2022. The extracted sources are the following: - STA (sta.si) - RTV (rtvslo.si) - Delo (delo.si) - Siol (siol.net) - Vestnik (vestnik.si) - Večer (vecer.com) - Svet24 – Novice (novice.svet24.si) - 24ur (24ur.com) - Dnevnik (dnevnik.si) - Žurnal24 (zurnal24.si) - Demokracija (demokracija.si) - Nova24TV (nova24tv.si) - Slovenske novice (slovenskenovice.si) - Gorenjski glas (gorenjskiglas.si) - Svet 24 – Ekipa (ekipa.svet24.si) The frequency lists obtained from Trendi were then compared to the frequency list of words from Gigafida 2.0 (http://hdl.handle.net/11356/1320; covering the period between 1991–2018). The final frequency list contains lemmas, their lexical features, and – for each source (including Gigafida 2.0) – their absolute and relative frequencies from the first (1991–2018) and second periods (from 2019 to 2022-07), as well as the simple maths value indicating if the word is more frequent in 2019-2022-07 (simple maths > 1.00) or in 1991–2018 (simple maths < 1.00). Because the entire frequency list is quite large, a shorter version with the first 150,000 entries is also provided for easier use in data processing software (such as MS Excel). The lists are sorted by their total absolute frequencies. Note that words with a total frequency of 1 (when adding absolute frequencies from both compared corpora; hapax legomena) were removed.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://sled.ijs.si/
dc.subject frequency list
dc.subject words
dc.subject monitor corpus
dc.subject news sources
dc.title Frequency list of words by source from the Trendi corpus 2022-07
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture of the Republic of Slovenia JR-infrastruktura-SJ-2021-2022 SLED - Monitor corpus of Slovene and related resources nationalFunds
size.info 2 files
size.info 1993118 entries
files.count 1
files.size 66760030


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
trendi_words-by-source_2019-2022.zip
Size
63.67 MB
Format
application/zip
Description
trendi_words-by-source_2019-2022
MD5
6e1365936b5833b368904c634907c66b
 Download file  Preview
 File Preview  
    • trendi_2019-2022_words-by-source_short.tsv78 MB
    • trendi_2019-2022_words-by-source_entire.tsv772 MB
    • 00README.txt1 kB

Show simple item record