Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Bitenc, Maja
dc.date.accessioned 2018-08-18T16:17:39Z
dc.date.available 2018-08-18T16:17:39Z
dc.date.issued 2018-08-18
dc.identifier.uri http://hdl.handle.net/11356/1199
dc.description The KAS-biterm bilingual term extraction dataset contains complete sentences selected from PhD theses from the KAS corpus of Slovene academic writing. Only sentences that have a high chance of containing the term in the original language and its translation into Slovene were chosen, by using three CQL patterms in noSketch Engine. These sentences are manually annotated for (1) terms, (2) partial terms and (3) abbreviations in (a) Slovene, (b) English, or (c) other language. Links between the Slovene terms and their equivalents in the other languages, as well as their abbreviations, are encoded as well. The resource can serve as a training set for supervised learning of bilingual term extraction tools and their benchmarking.
dc.language.iso slv
dc.language.iso eng
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/kas/
dc.subject terminology
dc.subject manual annotation
dc.title Bilingual terminology extraction dataset KAS-biterm 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarinsi/kas-biterm
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds
size.info 1952 sentences
size.info 78491 tokens
size.info 3732 terms
files.count 2
files.size 1914238


 Files in this item

 Download all files in item (1.83 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
KAS-biterm.TEI.zip
Size
1.01 MB
Format
application/zip
Description
Corpus in TEI format
MD5
ea3f035e5f0c2ac980524622a384f55a
 Download file  Preview
 File Preview  
  • KAS-biterm.TEI
    • msd-fslib-sl.xml465 kB
    • kas-biterm.xml10 kB
    • kas-biterm.body.xml4 MB
    • schema
      • tei_clarin.zip47 kB
      • tei_clarin.rnc206 kB
      • tei_clarin.dtd167 kB
      • tei_clarin_doc.html2 MB
      • tei_clarin.rng424 kB
    • 00README.txt181 B
Icon
Name
KAS-biterm-smernice-v1.0.pdf
Size
830.81 KB
Format
PDF
Description
Annotation guidelines (in Slovenian)
MD5
529638f68d81ee34133c4bea2f4915fd
 Download file

Show simple item record