Prikaži enostavni zapis vnosa
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Esplà-Gomis, Miquel |
dc.contributor.author |
Ortiz Rojas, Sergio |
dc.contributor.author |
Klubička, Filip |
dc.contributor.author |
Toral, Antonio |
dc.date.accessioned |
2016-03-09T16:47:40Z |
dc.date.available |
2016-03-09T16:47:40Z |
dc.date.issued |
2016-03-09 |
dc.identifier.uri |
http://hdl.handle.net/11356/1058 |
dc.description |
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 80% and on the word level around 84%. |
dc.language.iso |
hrv |
dc.language.iso |
eng |
dc.publisher |
Jožef Stefan Institute |
dc.relation |
info:eu-repo/grantAgreement/EC/FP7/324414 |
dc.rights |
CLARIN.SI User Licence for Internet Corpora |
dc.rights.uri |
https://www.clarin.si/info/wp-content/uploads/2016/01/CLARIN.SI-WAC-2016-01.pdf |
dc.rights.label |
ACA |
dc.source.uri |
http://nlp.ffzg.hr/resources/corpora/hrenwac/ |
dc.subject |
parallel corpus |
dc.subject |
web corpus |
dc.subject |
multilingual |
dc.title |
Croatian-English parallel corpus hrenWaC 2.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Nikola Ljubešić nljubesi@gmail.com Jožef Stefan Institute |
sponsor |
European Union FP7-PEOPLE-2012-IAPP PIAP-GA-2012-324414 Abu-MaTran euFunds info:eu-repo/grantAgreement/EC/FP7/324414 |
size.info |
55083246 words |
size.info |
1554912 sentences |
files.count |
1 |
files.size |
195521908 |
Datoteke v tem vnosu
- Ime
- hrenwac_v2.0.tmx.tgz
- Velikost
- 186.46
MB
- Format
- Neznano
- MD5
- a0e008f53bcfe50beebc08b134b3ed69
Prenesi datoteko
Prikaži enostavni zapis vnosa