Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Fišer, Darja
dc.date.accessioned 2020-05-05T10:32:52Z
dc.date.available 2020-05-05T10:32:52Z
dc.date.issued 2020-05-05
dc.identifier.uri http://hdl.handle.net/11356/1263
dc.description KAS-biterm is an automatically generated glossary of English terms with their translations into Slovene. The pairs, possibly with their English and Slovene acronyms, were extracted from the Corpus of Academic Slovene KAS 1.0 (http://hdl.handle.net/11356/1244), where they have been annotated with the kas-biterm tool (https://github.com/clarinsi/kas-biterm) trained on the Bilingual terminology extraction dataset KAS-biterm 1.0 (http://hdl.handle.net/11356/1199). Note that only Query 1 was used for pre-selection of the sentences and for training the tool, and that the bi-lingual terms from the KAS corpus have been filtered to remove noise. The glossary is encoded in TEI-Lex0 (https://github.com/DARIAH-ERIC/lexicalresources) and gives, for each entry, also up to three examples of use, together with their bibliographic information. Various parts of the lexical entries also have links to the appropriate queries to CLARIN.SI noSketch Engine conconrdancer. The TEI encoded corpus is also available in a variant that is a much smaller document as it does not contain the examples of use and links.
dc.language.iso slv
dc.language.iso eng
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Ljubesic-et-al_KAS-term-and-KAS-biterm-Datasets-and-baselines-for-monolingual-and-bilingual-terminology-extraction-from-academic-writing.pdf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/kas/
dc.subject terminology
dc.subject PhD theses
dc.subject MSc/MA theses
dc.subject BSc/BA theses
dc.subject academic writing
dc.subject TEI
dc.title English-Slovene term candidates KAS-biterm 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType terminologicalResource
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 133710 entries
files.count 1
files.size 53206123


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
KAS-biterm.TEI.zip
Size
50.74 MB
Format
application/zip
Description
Glossary in TEI Lex0 format
MD5
b61a346c790ffada98232670cc8481db
 Download file  Preview
 File Preview  
  • KAS-biterm.TEI
    • kas-biterm.xml460 MB
    • kas-biterm.nocit.xml79 MB
    • Schema
      • TEILex0.rng272 kB
      • TEILex0-ODD.xml159 kB
      • TEILex0.rnc119 kB
      • TEILex0.dtd97 kB
      • TEILex0.sch482 B
    • 00README.txt535 B

Show simple item record