Prikaži enostavni zapis vnosa
dc.contributor.author |
Kosem, Iztok |
dc.contributor.author |
Čibej, Jaka |
dc.contributor.author |
Dobrovoljc, Kaja |
dc.contributor.author |
Erjavec, Tomaž |
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Ponikvar, Primož |
dc.contributor.author |
Šinkec, Mihael |
dc.contributor.author |
Krek, Simon |
dc.date.accessioned |
2022-06-23T18:24:49Z |
dc.date.available |
2022-06-23T18:24:49Z |
dc.date.issued |
2022-06-23 |
dc.identifier.uri |
http://hdl.handle.net/11356/1590 |
dc.description |
The Trendi corpus is a monitor corpus of Slovene. It contains news from 107 different media websites, published by 48 different publishers. Trendi 2022-05 covers the period from January 2019 to May 2022, complementing the Gigafida 2.0 reference corpus of written Slovene. All the contents of the Trendi corpus are at the moment obtained using the Jožef Stefan Institute Newsfeed service (http://newsfeed.ijs.si/).
The texts have been annotated using the classla-stanza pipeline (https://github.com/clarinsi/classla), including syntactic parsing according to the Universal Dependencies (https://universaldependencies.org/sl/) and Named Entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf).
At the moment, the corpus is not available as a dataset due to copyright restrictions, we hope to make at least some of it available in the near future. The corpus is accessible through CLARIN.SI concordancers. |
dc.language.iso |
slv |
dc.publisher |
Jožef Stefan Institute |
dc.relation.isreplacedby |
http://hdl.handle.net/11356/1681 |
dc.source.uri |
https://sled.ijs.si/ |
dc.subject |
monitor corpus |
dc.subject |
news corpus |
dc.subject |
universal dependencies |
dc.subject |
temporal trends |
dc.title |
Monitor corpus of Slovene Trendi 2022-05 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
hidden |
has.files |
no |
branding |
CLARIN.SI data & tools |
contact.person |
Iztok Kosem iztok.kosem@ijs.si Jožef Stefan Institute |
sponsor |
Ministry of Culture of the Republic of Slovenia JR-infrastruktura-SJ-2021-2022 SLED - Monitor corpus of Slovene and related resources nationalFunds |
size.info |
565308991 tokens |
size.info |
473161579 words |
size.info |
25186942 sentences |
size.info |
1436548 articles |
files.count |
0 |
files.size |
0 |
Prikaži enostavni zapis vnosa