dc.contributor.author | Bogunović, Irena |
dc.contributor.author | Kučić, Mario |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Erjavec, Tomaž |
dc.date.accessioned | 2021-03-16T12:19:02Z |
dc.date.available | 2021-03-16T12:19:02Z |
dc.date.issued | 2021-03-14 |
dc.identifier.uri | http://hdl.handle.net/11356/1416 |
dc.description | The corpus consists of texts collected from the most popular (based on the Reuters Institute Digital News Report for 2018, retrieved from http://www.digitalnewsreport.org in April, 2019) news portals in Croatia in the period from 2014 to 2018: Direktno, Dnevno, Net Hr, Hrt, Index_Hr, Jutarnji, Novilist, Rtl, SlobodnaDalmacija, Večernji, Tportal, Dnevnik. Web browsing and web crawling were used to select and store the texts with their useful HTML information (publication date of the article, its URL, and title). The linguistic processing of the corpus was performed with the CLASSLA package (https://pypi.org/project/classla/) on the levels of tokenization, sentence splitting, morphosyntactic tagging, lemmatization, dependency parsing and named entity recognition. This corpus is a linguistically-processed version of the original corpus published at https://repository.pfri.uniri.hr/islandora/object/pfri%3A2156 and is distributed in the CoNLL-U format (https://universaldependencies.org/format.html). |
dc.language.iso | hrv |
dc.publisher | University of Rijeka, Faculty of Maritime Studies |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.laconlab.com/projects/engri |
dc.subject | news corpus |
dc.subject | contemporary language |
dc.title | Corpus of Croatian news portals ENGRI (2014-2018) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Irena Bogunović bogunovic@pfri.hr University of Rijeka, Faculty of Maritime studies |
sponsor | Croatian Science Foundation UIP-2019-04-1576 English words in Croatian: Identification, affective-semantic norming and investigation into cognitive processing via behavioural and neuroscientific methods (ENGRI) nationalFunds |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
size.info | 694799268 tokens |
size.info | 1756735 texts |
files.count | 12 |
files.size | 9105493691 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=engri |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=engri |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Ime
- engri.24sata.hr.conllu.gz
- Velikost
- 1.02 GB
- Format
- application/gzip
- Opis
- Portal 24sata.hr
- MD5
- 80a664e2c94b137c06695747e0215618

- Ime
- engri.direktno.hr.conllu.gz
- Velikost
- 686.57 MB
- Format
- application/gzip
- Opis
- Portal direktno.hr
- MD5
- 180a932f19ad3292632e2ea3a35bc7d4

- Ime
- engri.dnevno.hr.conllu.gz
- Velikost
- 745.9 MB
- Format
- application/gzip
- Opis
- Portal dnevno.hr
- MD5
- 7d3ce12690fe3efe10da99263b324e9b

- Ime
- engri.hrt.hr.conllu.gz
- Velikost
- 397.95 MB
- Format
- application/gzip
- Opis
- Portal hrt.hr
- MD5
- 84ebbb053057638b550ecd3c3added31

- Ime
- engri.index.hr.conllu.gz
- Velikost
- 215.34 MB
- Format
- application/gzip
- Opis
- Portal index.hr
- MD5
- b8df995b906039dfc3902058339654a6

- Ime
- engri.jutarnji.hr.conllu.gz
- Velikost
- 841.22 MB
- Format
- application/gzip
- Opis
- Portal jutarnji.hr
- MD5
- 33b194fd6e6f8aa859f0a70c1305f195

- Ime
- engri.net.hr.conllu.gz
- Velikost
- 627.68 MB
- Format
- application/gzip
- Opis
- Portal net.hr
- MD5
- 53061f3fe570ed47a8a6542cd20d08c7

- Ime
- engri.novilist.hr.conllu.gz
- Velikost
- 950.8 MB
- Format
- application/gzip
- Opis
- Portal novilist.hr
- MD5
- 9fd25ea6dc416bd18c18b0af4fa869bb

- Ime
- engri.rtl.hr.conllu.gz
- Velikost
- 871.01 MB
- Format
- application/gzip
- Opis
- Portal rtl.hr
- MD5
- 65b0537c98b0efc7b368ae1e15110da5

- Ime
- engri.slobodnadalmacija.hr.conllu.gz
- Velikost
- 649.56 MB
- Format
- application/gzip
- Opis
- Portal slobodnadalmacija.hr
- MD5
- 459ef6df4171632b7e59d718d536f6c0

- Ime
- engri.telegram.hr.conllu.gz
- Velikost
- 335.98 MB
- Format
- application/gzip
- Opis
- Portal telegram.hr
- MD5
- 38d375fa41a5ee2cd4348c1c6749c2eb

- Ime
- engri.vecernji.hr.conllu.gz
- Velikost
- 1.29 GB
- Format
- application/gzip
- Opis
- Portal vecernji.hr
- MD5
- 2db766b402d05271b5fbf3b74d1adc57