Prikaži enostavni zapis vnosa
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Esplà-Gomis, Miquel |
dc.contributor.author |
Ortiz Rojas, Sergio |
dc.contributor.author |
Klubička, Filip |
dc.contributor.author |
Toral, Antonio |
dc.date.accessioned |
2016-03-09T16:51:44Z |
dc.date.available |
2016-03-09T16:51:44Z |
dc.date.issued |
2016-03-09 |
dc.identifier.uri |
http://hdl.handle.net/11356/1059 |
dc.description |
The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the sentence level and 76% on the word level. |
dc.language.iso |
srp |
dc.language.iso |
eng |
dc.publisher |
Jožef Stefan Institute |
dc.relation |
info:eu-repo/grantAgreement/EC/FP7/324414 |
dc.rights |
CLARIN.SI User Licence for Internet Corpora |
dc.rights.uri |
https://www.clarin.si/info/wp-content/uploads/2016/01/CLARIN.SI-WAC-2016-01.pdf |
dc.rights.label |
ACA |
dc.subject |
parallel corpus |
dc.subject |
web corpus |
dc.subject |
multilingual |
dc.title |
Serbian-English parallel corpus srenWaC 1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Nikola Ljubešić nljubesi@gmail.com Jožef Stefan Institute |
sponsor |
European Union FP7-PEOPLE-2012-IAPP PIAP-GA-2012-324414 Abu-MaTran euFunds info:eu-repo/grantAgreement/EC/FP7/324414 |
size.info |
23139804 words |
size.info |
534682 sentences |
files.count |
1 |
files.size |
74389140 |
Datoteke v tem vnosu
- Ime
- srenwac_v1.0.tmx.tgz
- Velikost
- 70.94
MB
- Format
- Neznano
- Opis
- TMX as gzipped tar
- MD5
- 90e15d9587c7dd892b89edc079d35c9d
Prenesi datoteko
Prikaži enostavni zapis vnosa