Prikaži enostavni zapis vnosa
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Esplà-Gomis, Miquel |
dc.contributor.author |
Ortiz Rojas, Sergio |
dc.contributor.author |
Klubička, Filip |
dc.contributor.author |
Toral, Antonio |
dc.date.accessioned |
2016-03-09T17:05:19Z |
dc.date.available |
2016-03-09T17:05:19Z |
dc.date.issued |
2016-03-09 |
dc.identifier.uri |
http://hdl.handle.net/11356/1060 |
dc.description |
The fienWaC corpus version 1.0 consists of parallel Finnish-English texts crawled from the .fi top-level domain for Finland. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the segment level and 76% on the word level. |
dc.language.iso |
fin |
dc.language.iso |
eng |
dc.publisher |
Jožef Stefan Institute |
dc.relation |
info:eu-repo/grantAgreement/EC/FP7/324414 |
dc.rights |
CLARIN.SI User Licence for Internet Corpora |
dc.rights.uri |
https://www.clarin.si/info/wp-content/uploads/2016/01/CLARIN.SI-WAC-2016-01.pdf |
dc.rights.label |
ACA |
dc.subject |
parallel corpus |
dc.subject |
web corpus |
dc.subject |
multilingual |
dc.title |
Finnish-English parallel corpus fienWaC 1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Nikola Ljubešić nljubesi@gmail.com Jožef Stefan Institute |
sponsor |
European Union FP7-PEOPLE-2012-IAPP PIAP-GA-2012-324414 Abu-MaTran euFunds info:eu-repo/grantAgreement/EC/FP7/324414 |
size.info |
77048083 words |
size.info |
2866574 sentences |
files.count |
1 |
files.size |
297448076 |
Datoteke v tem vnosu
- Ime
- fienwac_v1.0.tmx.tgz
- Velikost
- 283.67
MB
- Format
- Neznano
- MD5
- 0e702cf28c098fb72a5e9f815170b519
Prenesi datoteko
Prikaži enostavni zapis vnosa