Prikaži enostavni zapis vnosa

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Bitenc, Maja
dc.date.accessioned 2018-08-18T16:17:39Z
dc.date.available 2018-08-18T16:17:39Z
dc.date.issued 2018-08-18
dc.identifier.uri http://hdl.handle.net/11356/1199
dc.description The KAS-biterm bilingual term extraction dataset contains complete sentences selected from PhD theses from the KAS corpus of Slovene academic writing. Only sentences that have a high chance of containing the term in the original language and its translation into Slovene were chosen, by using three CQL patterms in noSketch Engine. These sentences are manually annotated for (1) terms, (2) partial terms and (3) abbreviations in (a) Slovene, (b) English, or (c) other language. Links between the Slovene terms and their equivalents in the other languages, as well as their abbreviations, are encoded as well. The resource can serve as a training set for supervised learning of bilingual term extraction tools and their benchmarking.
dc.language.iso slv
dc.language.iso eng
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/kas/
dc.subject terminology
dc.subject manual annotation
dc.title Bilingual terminology extraction dataset KAS-biterm 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarinsi/kas-biterm
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds
size.info 1952 sentences
size.info 78491 tokens
size.info 3732 terms
files.count 2
files.size 1914238


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (1.83 MB)
Icon
Ime
KAS-biterm.TEI.zip
Velikost
1.01 MB
Format
application/zip
Opis
Corpus in TEI format
MD5
ea3f035e5f0c2ac980524622a384f55a
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • KAS-biterm.TEI
    • msd-fslib-sl.xml465 kB
    • kas-biterm.xml10 kB
    • kas-biterm.body.xml4 MB
    • schema
      • tei_clarin.zip47 kB
      • tei_clarin.rnc206 kB
      • tei_clarin.dtd167 kB
      • tei_clarin_doc.html2 MB
      • tei_clarin.rng424 kB
    • 00README.txt181 B
Icon
Ime
KAS-biterm-smernice-v1.0.pdf
Velikost
830.81 KB
Format
PDF
Opis
Annotation guidelines (in Slovenian)
MD5
529638f68d81ee34133c4bea2f4915fd
 Prenesi datoteko

Prikaži enostavni zapis vnosa