Show simple item record

 
dc.contributor.author Bogunović, Irena
dc.contributor.author Kučić, Mario
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2021-03-16T12:19:02Z
dc.date.available 2021-03-16T12:19:02Z
dc.date.issued 2021-03-14
dc.identifier.uri http://hdl.handle.net/11356/1416
dc.description The corpus consists of texts collected from the most popular (based on the Reuters Institute Digital News Report for 2018, retrieved from http://www.digitalnewsreport.org in April, 2019) news portals in Croatia in the period from 2014 to 2018: Direktno, Dnevno, Net Hr, Hrt, Index_Hr, Jutarnji, Novilist, Rtl, SlobodnaDalmacija, Večernji, Tportal, Dnevnik. Web browsing and web crawling were used to select and store the texts with their useful HTML information (publication date of the article, its URL, and title). The linguistic processing of the corpus was performed with the CLASSLA package (https://pypi.org/project/classla/) on the levels of tokenization, sentence splitting, morphosyntactic tagging, lemmatization, dependency parsing and named entity recognition. This corpus is a linguistically-processed version of the original corpus published at https://repository.pfri.uniri.hr/islandora/object/pfri%3A2156 and is distributed in the CoNLL-U format (https://universaldependencies.org/format.html).
dc.language.iso hrv
dc.publisher University of Rijeka, Faculty of Maritime Studies
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.laconlab.com/projects/engri
dc.subject news corpus
dc.subject contemporary language
dc.title Corpus of Croatian news portals ENGRI (2014-2018)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Irena Bogunović bogunovic@pfri.hr University of Rijeka, Faculty of Maritime studies
sponsor Croatian Science Foundation UIP-2019-04-1576 English words in Croatian: Identification, affective-semantic norming and investigation into cognitive processing via behavioural and neuroscientific methods (ENGRI) nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 694799268 tokens
size.info 1756735 texts
files.count 12
files.size 9105493691
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=engri
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=engri


 Files in this item

Icon
Name
engri.24sata.hr.conllu.gz
Size
1.02 GB
Format
application/gzip
Description
Portal 24sata.hr
MD5
80a664e2c94b137c06695747e0215618
 Download file
Icon
Name
engri.direktno.hr.conllu.gz
Size
686.57 MB
Format
application/gzip
Description
Portal direktno.hr
MD5
180a932f19ad3292632e2ea3a35bc7d4
 Download file
Icon
Name
engri.dnevno.hr.conllu.gz
Size
745.9 MB
Format
application/gzip
Description
Portal dnevno.hr
MD5
7d3ce12690fe3efe10da99263b324e9b
 Download file
Icon
Name
engri.hrt.hr.conllu.gz
Size
397.95 MB
Format
application/gzip
Description
Portal hrt.hr
MD5
84ebbb053057638b550ecd3c3added31
 Download file
Icon
Name
engri.index.hr.conllu.gz
Size
215.34 MB
Format
application/gzip
Description
Portal index.hr
MD5
b8df995b906039dfc3902058339654a6
 Download file
Icon
Name
engri.jutarnji.hr.conllu.gz
Size
841.22 MB
Format
application/gzip
Description
Portal jutarnji.hr
MD5
33b194fd6e6f8aa859f0a70c1305f195
 Download file
Icon
Name
engri.net.hr.conllu.gz
Size
627.68 MB
Format
application/gzip
Description
Portal net.hr
MD5
53061f3fe570ed47a8a6542cd20d08c7
 Download file
Icon
Name
engri.novilist.hr.conllu.gz
Size
950.8 MB
Format
application/gzip
Description
Portal novilist.hr
MD5
9fd25ea6dc416bd18c18b0af4fa869bb
 Download file
Icon
Name
engri.rtl.hr.conllu.gz
Size
871.01 MB
Format
application/gzip
Description
Portal rtl.hr
MD5
65b0537c98b0efc7b368ae1e15110da5
 Download file
Icon
Name
engri.slobodnadalmacija.hr.conllu.gz
Size
649.56 MB
Format
application/gzip
Description
Portal slobodnadalmacija.hr
MD5
459ef6df4171632b7e59d718d536f6c0
 Download file
Icon
Name
engri.telegram.hr.conllu.gz
Size
335.98 MB
Format
application/gzip
Description
Portal telegram.hr
MD5
38d375fa41a5ee2cd4348c1c6749c2eb
 Download file
Icon
Name
engri.vecernji.hr.conllu.gz
Size
1.29 GB
Format
application/gzip
Description
Portal vecernji.hr
MD5
2db766b402d05271b5fbf3b74d1adc57
 Download file

Show simple item record