Show simple item record

 
dc.contributor.author Bučar, Jože
dc.date.accessioned 2017-04-23T17:46:05Z
dc.date.available 2017-04-23T17:46:05Z
dc.date.issued 2017-04-23
dc.identifier.uri http://hdl.handle.net/11356/1105
dc.description Five web-crawlers written in the R language for retrieving Slovenian texts from the news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content.
dc.language.iso slv
dc.publisher Faculty of Information Studies Novo mesto
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9413-3
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/19Joey85/Sentiment-annotated-news-corpus-and-sentiment-lexicon-in-Slovene/
dc.subject web crawling
dc.subject R
dc.title R crawlers for five Slovenian web media 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent false
has.files yes
branding CLARIN.SI data & tools
contact.person Jože Bučar joze.bucar@gmail.com Laboratory of Data Technologies, Faculty of Information Studies in Novo mesto, Slovenia
sponsor ARRS (Slovenian Research Agency) MR-35498 Young Researcher Programme nationalFunds
sponsor Human Resources Development and Scholarship Fund, Ministry of Education, Science and Sport, Slovenia 11012-55/2015 Javni razpis financiranja raziskovalnega sodelovanja dohtorskih študentov v tujini v letu 2014 (186. JR) nationalFunds
sponsor The European Regional Development Fund Operational Programme for Strengthening Regional Development Potentials for the period 2007-2013 Other
files.count 6
files.size 218992


 Files in this item

 Download all files in item (213.86 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
readme_web_crawlers.txt
Size
1.3 KB
Format
Text file
Description
README file
MD5
4c42c0b4f2097f31cecde483d937ca29
 Download file  Preview
 File Preview  
Author: Jože Bučar, Faculty of Information Studies in Novo mesto (contact: joze.bucar@gmail.com)

Abstract:
Five web-crawlers written in the R language for retrieving Slovenian news texts from the portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. Web crawlers are written in the R language.

Keywords:
Web-crawling, Slovene

Web resources:
- Slovenian news texts with political, business, economic and financial content published between 1 September 2007 and 31 January 2016 from five Slovenian web media from five web media: www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si

Type and size:
- .R (web-crawlers); size: 213 KB

Encoding: ANSI

Year: Last update 2016-02-14

Attributes (retrieved news):
URL main - Uniform Resource Locator (URL) of the resource (web medium) [string; www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si]
URL - URL of . . .
                                            
Icon
Name
web_crawler_24UR.r
Size
61.36 KB
Format
Unknown
Description
Web crawler for 24ur
MD5
683b72b1265c9bc64464cbe3e8df278b
 Download file
Icon
Name
web_crawler_Dnevnik.r
Size
16.5 KB
Format
Unknown
Description
Web crawler for Dnevnik
MD5
704bbf20657684968a4260cc0d76fb5e
 Download file
Icon
Name
web_crawler_Finance.r
Size
50.96 KB
Format
Unknown
Description
Web crawler for Finance
MD5
436b1103bb5ddabf4aadd5aaee652584
 Download file
Icon
Name
web_crawler_RTVSLO.r
Size
41.08 KB
Format
Unknown
Description
Web crawler for Rtvslo
MD5
201e4b0ad829c52a06f2c0a8cf027643
 Download file
Icon
Name
web_crawler_Zurnal24.r
Size
42.66 KB
Format
Unknown
Description
Web crawler for Zurnal24
MD5
694b2ba88f6aa905563f58218e7b3b38
 Download file

Show simple item record