Prikaži enostavni zapis vnosa

 
dc.contributor.author Jemec Tomazin, Mateja
dc.contributor.author Trojar, Mitja
dc.contributor.author Žagar, Mojca
dc.contributor.author Atelšek, Simon
dc.contributor.author Fajfar, Tanja
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2021-03-15T15:22:18Z
dc.date.available 2021-03-15T15:22:18Z
dc.date.issued 2021-03-15
dc.identifier.uri http://hdl.handle.net/11356/1400
dc.description The RSDO5 corpus was compiled in order to serve as a training set for automatic term identification. It consists of 12 texts with 250,000 words and almost 38,000 manually annotated terms. The corpus texts were published between 2000 and 2019, are either PhD theses (3), a scientific book based on a PhD thesis (1), graduate level text books (4), or journal articles (4) and belong to the fields of biomechanics (3), linguistics (3), chemistry (3), or veterinary science (3). Apart from the manually annotated terms, the corpus was automatically annotated with Universal Dependencies annotations, i.e. tokenisation, sentence segmentation, lemmatisation, morpological features and dependency syntax.
dc.language.iso slv
dc.publisher ZRC SAZU
dc.relation.isreplacedby http://hdl.handle.net/11356/1470
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/en/terminology-portal
dc.subject terminology
dc.subject manual annotation
dc.subject TEI
dc.title Corpus of term-annotated texts RSDO5 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Mateja Jemec Tomazin mjt@zrc-sazu.si ZRC SAZU
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 12 texts
size.info 37985 terms
size.info 257029 words
size.info 310588 tokens
files.count 4
files.size 16103689


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (15.36 MB)
Icon
Ime
rsdo5.TEI.zip
Velikost
7.61 MB
Format
application/zip
Opis
Corpus in source TEI format
MD5
2469267cb71e51c26460ca9de4f8e21f
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • rsdo5.TEI
    • rsdo5kemucb.xml911 kB
    • rsdo5kemcla.xml361 kB
    • rsdo5bimucb.xml2 MB
    • rsdo5bimdis.xml8 MB
    • schema
      • tei_clarin.rng662 kB
      • tei_clarin.sch504 B
      • dcr.tmp1 kB
      • tei_clarin.dtd248 kB
      • tei_clarin_doc.xml8 MB
      • tei_clarin_doc.html8 MB
      • tei_clarin.rnc316 kB
      • tei_clarin_example.xml31 kB
      • xml.tmp2 kB
      • tei_clarin.xsd741 kB
      • tei_clarin_schema.xml3 kB
    • rsdo5bimcla.xml839 kB
    • rsdo5vetucb.xml7 MB
    • rsdo5jezucb.xml3 MB
    • rsdo5kemdis.xml11 MB
    • rsdo5vetdis.xml6 MB
    • rsdo5jezdis.xml17 MB
    • rsdo5vetcla.xml668 kB
    • rsdo5jezcla.xml1021 kB
    • 00README.txt287 B
    • rsdo5.xml18 kB
Icon
Ime
rsdo5.conllu.zip
Velikost
3.46 MB
Format
application/zip
Opis
Corpus in CoNLL-U format
MD5
57fd3cf37c868e6cc2595aac330f81be
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • rsdo5.conllu
    • rsdo5vetcla.conllu250 kB
    • rsdo5vetucb.conllu2 MB
    • rsdo5-meta.tsv3 kB
    • rsdo5kemucb.conllu342 kB
    • rsdo5jezucb.conllu1 MB
    • rsdo5bimdis.conllu3 MB
    • rsdo5bimcla.conllu313 kB
    • rsdo5bimucb.conllu1 MB
    • rsdo5kemdis.conllu4 MB
    • 00README.txt419 B
    • rsdo5jezdis.conllu6 MB
    • rsdo5vetdis.conllu2 MB
    • rsdo5kemcla.conllu135 kB
    • rsdo5jezcla.conllu391 kB
Icon
Ime
rsdo5.vert.zip
Velikost
3.71 MB
Format
application/zip
Opis
Corpus in vertical format
MD5
1f24a3c21eb52c6e54bc92d348995f18
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • rsdo5.vert
    • rsdo5jezdis.vert11 MB
    • rsdo5bimdis.vert5 MB
    • rsdo5kemdis.vert7 MB
    • rsdo5jezcla.vert677 kB
    • rsdo5vetdis.vert4 MB
    • rsdo5kemcla.vert232 kB
    • rsdo5jezucb.vert2 MB
    • rsdo5bimucb.vert1 MB
    • rsdo5vetcla.vert426 kB
    • rsdo5kemucb.vert588 kB
    • 00README.txt571 B
    • rsdo5bimcla.vert522 kB
    • rsdo5.regi2 kB
    • rsdo5vetucb.vert4 MB
Icon
Ime
rsdo5.txt.zip
Velikost
597.15 KB
Format
application/zip
Opis
Corpus in plain text format
MD5
0c004ab495039b351cec99779413aaa4
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • rsdo5.txt
    • rsdo5kemcla.txt11 kB
    • rsdo5bimucb.txt87 kB
    • rsdo5-meta.tsv3 kB
    • rsdo5bimdis.txt269 kB
    • rsdo5bimcla.txt25 kB
    • rsdo5jezucb.txt111 kB
    • rsdo5vetucb.txt252 kB
    • rsdo5kemdis.txt383 kB
    • rsdo5jezdis.txt557 kB
    • rsdo5vetdis.txt215 kB
    • rsdo5vetcla.txt21 kB
    • rsdo5jezcla.txt34 kB
    • 00README.txt556 B
    • rsdo5kemucb.txt28 kB

Prikaži enostavni zapis vnosa