dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Krek, Simon |
dc.date.accessioned | 2015-06-06T22:24:21Z |
dc.date.available | 2015-06-06T22:24:21Z |
dc.date.issued | 2010-03-07 |
dc.identifier.uri | http://hdl.handle.net/11356/1037 |
dc.description | The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This silver-standard corpus is annotated for morphosyntactic descriptions (fine grained PoS tags) and lemmas, with about one fourth of the most problematic annotations hand-validated. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers. Note that the vertical format does not contain all of the information from the source TEI. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | http://www.lrec-conf.org/proceedings/lrec2010/summaries/139.html |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1213 |
dc.rights | Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://nl.ijs.si/jos/jos1M-en.html |
dc.subject | tagging |
dc.subject | lemmatisation |
dc.subject | manual annotation |
dc.subject | TEI |
dc.title | Training corpus jos1M 1.1 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://nl.ijs.si/jos/jos1M/jos1Mv1_1_hdr-en.html |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J2-9180 Linguistic annotation of Slovene nationalFunds |
sponsor | EU FP6 033917 SMART “Statistical Multilingual Analysis for Retrieval and Translation” Other |
sponsor | Ministry of Higher Education, Science and Technology European Fund for Regional Development Mobile reader for blind and sight impaired persons nationalFunds |
size.info | 1000019 words |
files.count | 3 |
files.size | 26762709 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (25.52 MB)To je vnos
Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)




- Ime
- jos1Mv1_1-xml.zip
- Velikost
- 12.81 MB
- Format
- application/zip
- Opis
- TEI encoded texts
- MD5
- 8e7cd16c3d0709f455afc04ca46612f0
- jos1M
- jos1M-10.xml8 MB
- jos1M-08.xml8 MB
- jos1Mv1_1.xml1 MB
- jos1M-07.xml8 MB
- jos1M-06.xml8 MB
- jos1M-05.xml8 MB
- jos1M-04.xml8 MB
- jos1M-03.xml8 MB
- jos1M-02.xml9 MB
- jos1M-01.xml8 MB
- jos1M-09.xml8 MB

- Ime
- jos1Mv1_1-en.zip
- Velikost
- 6.34 MB
- Format
- application/zip
- Opis
- Vertical format, MSDs in English
- MD5
- 8b28c38845ace784f46e91629697f804
- jos1M
- jos1M-04-en.cqp2 MB
- jos1M-09-en.cqp2 MB
- josMSD-canon-en.tbl366 kB
- jos1M-03-en.cqp2 MB
- jos1M-10-en.cqp2 MB
- jos1Mv1_1_hdr-en.html5 MB
- jos1M-08-en.cqp2 MB
- jos1M-02-en.cqp2 MB
- jos1M-07-en.cqp2 MB
- jos1M-01-en.cqp2 MB
- jos1M-06-en.cqp2 MB
- jos1M-05-en.cqp2 MB

- Ime
- jos1Mv1_1-sl.zip
- Velikost
- 6.37 MB
- Format
- application/zip
- Opis
- Vertical format, MSDs in Slovene
- MD5
- 43754624fe081bec907036642676fc67
- jos1M
- jos1M-07-sl.cqp2 MB
- jos1M-01-sl.cqp2 MB
- jos1M-06-sl.cqp2 MB
- jos1M-05-sl.cqp2 MB
- jos1Mv1_1_hdr-sl.html6 MB
- jos1M-04-sl.cqp2 MB
- josMSD-canon-sl.tbl381 kB
- jos1M-10-sl.cqp2 MB
- jos1M-09-sl.cqp2 MB
- jos1M-03-sl.cqp2 MB
- jos1M-08-sl.cqp2 MB
- jos1M-02-sl.cqp2 MB