dc.contributor.author | Bučar, Jože |
dc.date.accessioned | 2017-04-23T17:46:05Z |
dc.date.available | 2017-04-23T17:46:05Z |
dc.date.issued | 2017-04-23 |
dc.identifier.uri | http://hdl.handle.net/11356/1105 |
dc.description | Five web-crawlers written in the R language for retrieving Slovenian texts from the news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. |
dc.language.iso | slv |
dc.publisher | Faculty of Information Studies Novo mesto |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9413-3 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/19Joey85/Sentiment-annotated-news-corpus-and-sentiment-lexicon-in-Slovene/ |
dc.subject | web crawling |
dc.subject | R |
dc.title | R crawlers for five Slovenian web media 1.0 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | false |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jože Bučar joze.bucar@gmail.com Laboratory of Data Technologies, Faculty of Information Studies in Novo mesto, Slovenia |
sponsor | ARRS (Slovenian Research Agency) MR-35498 Young Researcher Programme nationalFunds |
sponsor | Human Resources Development and Scholarship Fund, Ministry of Education, Science and Sport, Slovenia 11012-55/2015 Javni razpis financiranja raziskovalnega sodelovanja dohtorskih študentov v tujini v letu 2014 (186. JR) nationalFunds |
sponsor | The European Regional Development Fund Operational Programme for Strengthening Regional Development Potentials for the period 2007-2013 Other |
files.count | 6 |
files.size | 218992 |
Files in this item
Download all files in item (213.86 KB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- readme_web_crawlers.txt
- Size
- 1.3 KB
- Format
- Text file
- Description
- README file
- MD5
- 4c42c0b4f2097f31cecde483d937ca29
Author: Jože Bučar, Faculty of Information Studies in Novo mesto (contact: joze.bucar@gmail.com) Abstract: Five web-crawlers written in the R language for retrieving Slovenian news texts from the portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. Web crawlers are written in the R language. Keywords: Web-crawling, Slovene Web resources: - Slovenian news texts with political, business, economic and financial content published between 1 September 2007 and 31 January 2016 from five Slovenian web media from five web media: www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si Type and size: - .R (web-crawlers); size: 213 KB Encoding: ANSI Year: Last update 2016-02-14 Attributes (retrieved news): URL main - Uniform Resource Locator (URL) of the resource (web medium) [string; www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si] URL - URL of . . .

- Name
- web_crawler_24UR.r
- Size
- 61.36 KB
- Format
- Unknown
- Description
- Web crawler for 24ur
- MD5
- 683b72b1265c9bc64464cbe3e8df278b

- Name
- web_crawler_Dnevnik.r
- Size
- 16.5 KB
- Format
- Unknown
- Description
- Web crawler for Dnevnik
- MD5
- 704bbf20657684968a4260cc0d76fb5e

- Name
- web_crawler_Finance.r
- Size
- 50.96 KB
- Format
- Unknown
- Description
- Web crawler for Finance
- MD5
- 436b1103bb5ddabf4aadd5aaee652584

- Name
- web_crawler_RTVSLO.r
- Size
- 41.08 KB
- Format
- Unknown
- Description
- Web crawler for Rtvslo
- MD5
- 201e4b0ad829c52a06f2c0a8cf027643

- Name
- web_crawler_Zurnal24.r
- Size
- 42.66 KB
- Format
- Unknown
- Description
- Web crawler for Zurnal24
- MD5
- 694b2ba88f6aa905563f58218e7b3b38