dc.contributor.author | Jemec Tomazin, Mateja |
dc.contributor.author | Trojar, Mitja |
dc.contributor.author | Atelšek, Simon |
dc.contributor.author | Fajfar, Tanja |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Žagar Karer, Mojca |
dc.date.accessioned | 2021-12-07T16:51:49Z |
dc.date.available | 2021-12-07T16:51:49Z |
dc.date.issued | 2021-12-07 |
dc.identifier.uri | http://hdl.handle.net/11356/1470 |
dc.description | The RSDO5 corpus was compiled in order to serve as a training set for automatic term identification. It consists of 12 texts with 250,000 words and almost 38,000 manually annotated terms, each marked to be either in- or out-domain. The corpus texts were published between 2000 and 2019, are either PhD theses (3), a scientific book based on a PhD thesis (1), graduate level text books (4), or journal articles (4) and belong to the fields of biomechanics (3), linguistics (3), chemistry (3), or veterinary science (3). Apart from the manually annotated terms, the corpus was automatically annotated with Universal Dependencies annotations, i.e. tokenisation, sentence segmentation, lemmatisation, morpological features and dependency syntax. As opposed to the previous version, this one adds in- and out-domain marking on terms in the TEI and vertical files. |
dc.language.iso | slv |
dc.publisher | ZRC SAZU |
dc.relation.replaces | http://hdl.handle.net/11356/1400 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://rsdo.slovenscina.eu/terminoloski-portal |
dc.subject | terminology |
dc.subject | manual annotation |
dc.subject | TEI |
dc.title | Corpus of term-annotated texts RSDO5 1.1 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Mateja Jemec Tomazin mjt@zrc-sazu.si ZRC SAZU |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
size.info | 12 texts |
size.info | 37985 terms |
size.info | 257029 words |
size.info | 310588 tokens |
files.count | 4 |
files.size | 16376588 |
featuredService.kontext | search|https://www.clarin.si/kontext/first_form?corpname=rsdo5 |
featuredService.noske | search|https://www.clarin.si/ske/#dashboard?corpname=rsdo5 |
Files in this item
Download all files in item (15.62 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- rsdo5.TEI.zip
- Size
- 7.61 MB
- Format
- application/zip
- Description
- Corpus in source TEI format
- MD5
- b03fcfb68ccb30a6a9ce874d8b182732
- rsdo5.TEI
- rsdo5kemucb.xml911 kB
- rsdo5kemcla.xml361 kB
- rsdo5bimucb.xml2 MB
- rsdo5bimdis.xml8 MB
- schema
- tei_clarin.rng662 kB
- tei_clarin.sch504 B
- dcr.tmp1 kB
- tei_clarin.dtd248 kB
- tei_clarin_doc.xml8 MB
- tei_clarin_doc.html8 MB
- tei_clarin.rnc316 kB
- tei_clarin_example.xml31 kB
- xml.tmp2 kB
- tei_clarin.xsd741 kB
- tei_clarin_schema.xml3 kB
- rsdo5bimcla.xml839 kB
- rsdo5vetucb.xml7 MB
- rsdo5jezucb.xml3 MB
- rsdo5kemdis.xml11 MB
- rsdo5vetdis.xml6 MB
- rsdo5jezdis.xml17 MB
- rsdo5vetcla.xml668 kB
- rsdo5jezcla.xml1021 kB
- 00README.txt287 B
- rsdo5.xml18 kB

- Name
- rsdo5.conllu.zip
- Size
- 3.46 MB
- Format
- application/zip
- Description
- Corpus in CoNLL-U format
- MD5
- db275613749265e48451d50e4b1984b3
- rsdo5.conllu
- rsdo5vetcla.conllu250 kB
- rsdo5vetucb.conllu2 MB
- rsdo5-meta.tsv3 kB
- rsdo5kemucb.conllu342 kB
- rsdo5jezucb.conllu1 MB
- rsdo5bimdis.conllu3 MB
- rsdo5bimcla.conllu313 kB
- rsdo5bimucb.conllu1 MB
- rsdo5kemdis.conllu4 MB
- 00README.txt419 B
- rsdo5jezdis.conllu6 MB
- rsdo5vetdis.conllu2 MB
- rsdo5kemcla.conllu135 kB
- rsdo5jezcla.conllu391 kB

- Name
- rsdo5.vert.zip
- Size
- 3.97 MB
- Format
- application/zip
- Description
- Corpus in vertical format
- MD5
- 6547f91f4aaabac1b426823335b8d7ca
- rsdo5.vert
- rsdo5jezdis.vert14 MB
- rsdo5bimdis.vert7 MB
- rsdo5kemdis.vert9 MB
- rsdo5jezcla.vert887 kB
- rsdo5vetdis.vert5 MB
- rsdo5kemcla.vert307 kB
- rsdo5jezucb.vert2 MB
- rsdo5bimucb.vert2 MB
- rsdo5vetcla.vert562 kB
- rsdo5kemucb.vert770 kB
- 00README.txt571 B
- rsdo5bimcla.vert695 kB
- rsdo5.regi2 kB
- rsdo5vetucb.vert6 MB

- Name
- rsdo5.txt.zip
- Size
- 597.15 KB
- Format
- application/zip
- Description
- Corpus in plain text format
- MD5
- a045c94d6f4dd9db67ba11ff978b6c4b
- rsdo5.txt
- rsdo5kemcla.txt11 kB
- rsdo5bimucb.txt87 kB
- rsdo5-meta.tsv3 kB
- rsdo5bimdis.txt269 kB
- rsdo5bimcla.txt25 kB
- rsdo5jezucb.txt111 kB
- rsdo5vetucb.txt252 kB
- rsdo5kemdis.txt383 kB
- rsdo5jezdis.txt557 kB
- rsdo5vetdis.txt215 kB
- rsdo5vetcla.txt21 kB
- rsdo5jezcla.txt34 kB
- 00README.txt536 B
- rsdo5kemucb.txt28 kB