| dc.contributor.author | Bučar, Jože |
| dc.date.accessioned | 2017-04-23T17:46:05Z |
| dc.date.available | 2017-04-23T17:46:05Z |
| dc.date.issued | 2017-04-23 |
| dc.identifier.uri | http://hdl.handle.net/11356/1105 |
| dc.description | Five web-crawlers written in the R language for retrieving Slovenian texts from the news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. |
| dc.language.iso | slv |
| dc.publisher | Faculty of Information Studies Novo mesto |
| dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9413-3 |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://github.com/19Joey85/Sentiment-annotated-news-corpus-and-sentiment-lexicon-in-Slovene/ |
| dc.subject | web crawling |
| dc.subject | R |
| dc.title | R crawlers for five Slovenian web media 1.0 |
| dc.type | toolService |
| metashare.ResourceInfo#ContentInfo.detailedType | tool |
| metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | false |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Jože Bučar joze.bucar@gmail.com Laboratory of Data Technologies, Faculty of Information Studies in Novo mesto, Slovenia |
| sponsor | ARRS (Slovenian Research Agency) MR-35498 Young Researcher Programme nationalFunds |
| sponsor | Human Resources Development and Scholarship Fund, Ministry of Education, Science and Sport, Slovenia 11012-55/2015 Javni razpis financiranja raziskovalnega sodelovanja dohtorskih študentov v tujini v letu 2014 (186. JR) nationalFunds |
| sponsor | The European Regional Development Fund Operational Programme for Strengthening Regional Development Potentials for the period 2007-2013 Other |
| files.count | 6 |
| files.size | 218992 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (213.86 KB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Ime
- readme_web_crawlers.txt
- Velikost
- 1.3 KB
- Format
- Besedilna datoteka
- Opis
- README file
- MD5
- 4c42c0b4f2097f31cecde483d937ca29
Author: Jože Bučar, Faculty of Information Studies in Novo mesto (contact: joze.bucar@gmail.com)
Abstract:
Five web-crawlers written in the R language for retrieving Slovenian news texts from the portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. Web crawlers are written in the R language.
Keywords:
Web-crawling, Slovene
Web resources:
- Slovenian news texts with political, business, economic and financial content published between 1 September 2007 and 31 January 2016 from five Slovenian web media from five web media: www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si
Type and size:
- .R (web-crawlers); size: 213 KB
Encoding: ANSI
Year: Last update 2016-02-14
Attributes (retrieved news):
URL main - Uniform Resource Locator (URL) of the resource (web medium) [string; www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si]
URL - URL of . . .
- Ime
- web_crawler_24UR.r
- Velikost
- 61.36 KB
- Format
- Neznano
- Opis
- Web crawler for 24ur
- MD5
- 683b72b1265c9bc64464cbe3e8df278b
- Ime
- web_crawler_Dnevnik.r
- Velikost
- 16.5 KB
- Format
- Neznano
- Opis
- Web crawler for Dnevnik
- MD5
- 704bbf20657684968a4260cc0d76fb5e
- Ime
- web_crawler_Finance.r
- Velikost
- 50.96 KB
- Format
- Neznano
- Opis
- Web crawler for Finance
- MD5
- 436b1103bb5ddabf4aadd5aaee652584
- Ime
- web_crawler_RTVSLO.r
- Velikost
- 41.08 KB
- Format
- Neznano
- Opis
- Web crawler for Rtvslo
- MD5
- 201e4b0ad829c52a06f2c0a8cf027643
- Ime
- web_crawler_Zurnal24.r
- Velikost
- 42.66 KB
- Format
- Neznano
- Opis
- Web crawler for Zurnal24
- MD5
- 694b2ba88f6aa905563f58218e7b3b38