Show simple item record

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Čibej, Jaka
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Ponikvar, Primož
dc.contributor.author Šinkec, Mihael
dc.contributor.author Krek, Simon
dc.date.accessioned 2022-06-23T18:24:49Z
dc.date.available 2022-06-23T18:24:49Z
dc.date.issued 2022-06-23
dc.identifier.uri http://hdl.handle.net/11356/1590
dc.description The Trendi corpus is a monitor corpus of Slovene. It contains news from 107 different media websites, published by 48 different publishers. Trendi 2022-05 covers the period from January 2019 to May 2022, complementing the Gigafida 2.0 reference corpus of written Slovene. All the contents of the Trendi corpus are at the moment obtained using the Jožef Stefan Institute Newsfeed service (http://newsfeed.ijs.si/). The texts have been annotated using the classla-stanza pipeline (https://github.com/clarinsi/classla), including syntactic parsing according to the Universal Dependencies (https://universaldependencies.org/sl/) and Named Entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf). At the moment, the corpus is not available as a dataset due to copyright restrictions, we hope to make at least some of it available in the near future. The corpus is accessible through CLARIN.SI concordancers.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreplacedby http://hdl.handle.net/11356/1681
dc.source.uri https://sled.ijs.si/
dc.subject monitor corpus
dc.subject news corpus
dc.subject universal dependencies
dc.subject temporal trends
dc.title Monitor corpus of Slovene Trendi 2022-05
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files no
branding CLARIN.SI data & tools
contact.person Iztok Kosem iztok.kosem@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture of the Republic of Slovenia JR-infrastruktura-SJ-2021-2022 SLED - Monitor corpus of Slovene and related resources nationalFunds
size.info 565308991 tokens
size.info 473161579 words
size.info 25186942 sentences
size.info 1436548 articles
files.count 0
files.size 0


Show simple item record