Show simple item record

 
dc.contributor.author Jemec Tomazin, Mateja
dc.contributor.author Trojar, Mitja
dc.contributor.author Žagar, Mojca
dc.contributor.author Atelšek, Simon
dc.contributor.author Fajfar, Tatjana
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2021-03-15T15:22:18Z
dc.date.available 2021-03-15T15:22:18Z
dc.date.issued 2021-03-15
dc.identifier.uri http://hdl.handle.net/11356/1400
dc.description The RSDO5 corpus was compiled in order to serve as a training set for automatic term identification. It consists of 12 texts with 250,000 words and 38,000 manually annotated terms. The corpus texts were published between 2000 and 2019, are either PhD theses (3), a scientific book based on a PhD thesis (1), graduate level text books (4), or journal articles (4) and belong to the fields of biomechanics (3), linguistics (3), chemistry (3), or veterinary science (3). Apart from the manually annotated terms, the corpus was automatically annotated with Universal Dependencies annotations, i.e. tokenisation, sentence segmentation, lemmatisation, morpological features and dependency syntax.
dc.language.iso slv
dc.publisher ZRC SAZU
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.slovenscina.eu/terminoloski-portal
dc.subject terminology
dc.subject manual annotation
dc.subject TEI
dc.title Corpus of term-annotated texts RSDO5 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Mateja Jemec Tomazin mjt@zrc-sazu.si ZRC SAZU
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 12 texts
size.info 38043 terms
size.info 257029 words
size.info 310588 tokens
files.count 4
files.size 16103689
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=rsdo5
featuredService.noske search|https://www.clarin.si/noske/run.cgi/corp_info?corpname=rsdo5&struct_attr_stats=1&subcorpora=1


 Files in this item

 Download all files in item (15.36 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
rsdo5.TEI.zip
Size
7.61 MB
Format
application/zip
Description
Corpus in source TEI format
MD5
2469267cb71e51c26460ca9de4f8e21f
 Download file  Preview
 File Preview  
  • rsdo5.TEI
    • rsdo5kemucb.xml911 kB
    • rsdo5kemcla.xml361 kB
    • rsdo5bimucb.xml2 MB
    • rsdo5bimdis.xml8 MB
    • schema
      • tei_clarin.rng662 kB
      • tei_clarin.sch504 B
      • dcr.tmp1 kB
      • tei_clarin.dtd248 kB
      • tei_clarin_doc.xml8 MB
      • tei_clarin_doc.html8 MB
      • tei_clarin.rnc316 kB
      • tei_clarin_example.xml31 kB
      • xml.tmp2 kB
      • tei_clarin.xsd741 kB
      • tei_clarin_schema.xml3 kB
    • rsdo5bimcla.xml839 kB
    • rsdo5vetucb.xml7 MB
    • rsdo5jezucb.xml3 MB
    • rsdo5kemdis.xml11 MB
    • rsdo5vetdis.xml6 MB
    • rsdo5jezdis.xml17 MB
    • rsdo5vetcla.xml668 kB
    • rsdo5jezcla.xml1021 kB
    • 00README.txt287 B
    • rsdo5.xml18 kB
Icon
Name
rsdo5.conllu.zip
Size
3.46 MB
Format
application/zip
Description
Corpus in CoNLL-U format
MD5
57fd3cf37c868e6cc2595aac330f81be
 Download file  Preview
 File Preview  
  • rsdo5.conllu
    • rsdo5vetcla.conllu250 kB
    • rsdo5vetucb.conllu2 MB
    • rsdo5-meta.tsv3 kB
    • rsdo5kemucb.conllu342 kB
    • rsdo5jezucb.conllu1 MB
    • rsdo5bimdis.conllu3 MB
    • rsdo5bimcla.conllu313 kB
    • rsdo5bimucb.conllu1 MB
    • rsdo5kemdis.conllu4 MB
    • 00README.txt419 B
    • rsdo5jezdis.conllu6 MB
    • rsdo5vetdis.conllu2 MB
    • rsdo5kemcla.conllu135 kB
    • rsdo5jezcla.conllu391 kB
Icon
Name
rsdo5.vert.zip
Size
3.71 MB
Format
application/zip
Description
Corpus in vertical format
MD5
1f24a3c21eb52c6e54bc92d348995f18
 Download file  Preview
 File Preview  
  • rsdo5.vert
    • rsdo5jezdis.vert11 MB
    • rsdo5bimdis.vert5 MB
    • rsdo5kemdis.vert7 MB
    • rsdo5jezcla.vert677 kB
    • rsdo5vetdis.vert4 MB
    • rsdo5kemcla.vert232 kB
    • rsdo5jezucb.vert2 MB
    • rsdo5bimucb.vert1 MB
    • rsdo5vetcla.vert426 kB
    • rsdo5kemucb.vert588 kB
    • 00README.txt571 B
    • rsdo5bimcla.vert522 kB
    • rsdo5.regi2 kB
    • rsdo5vetucb.vert4 MB
Icon
Name
rsdo5.txt.zip
Size
597.15 KB
Format
application/zip
Description
Corpus in plain text format
MD5
0c004ab495039b351cec99779413aaa4
 Download file  Preview
 File Preview  
  • rsdo5.txt
    • rsdo5kemcla.txt11 kB
    • rsdo5bimucb.txt87 kB
    • rsdo5-meta.tsv3 kB
    • rsdo5bimdis.txt269 kB
    • rsdo5bimcla.txt25 kB
    • rsdo5jezucb.txt111 kB
    • rsdo5vetucb.txt252 kB
    • rsdo5kemdis.txt383 kB
    • rsdo5jezdis.txt557 kB
    • rsdo5vetdis.txt215 kB
    • rsdo5vetcla.txt21 kB
    • rsdo5jezcla.txt34 kB
    • 00README.txt556 B
    • rsdo5kemucb.txt28 kB

Show simple item record