| dc.contributor.author | Erjavec, Tomaž | 
| dc.contributor.author | Fišer, Darja | 
| dc.contributor.author | Ljubešić, Nikola | 
| dc.contributor.author | Bitenc, Maja | 
| dc.date.accessioned | 2018-08-18T16:17:39Z | 
| dc.date.available | 2018-08-18T16:17:39Z | 
| dc.date.issued | 2018-08-18 | 
| dc.identifier.uri | http://hdl.handle.net/11356/1199 | 
| dc.description | The KAS-biterm bilingual term extraction dataset contains complete sentences selected from PhD theses from the KAS corpus of Slovene academic writing. Only sentences that have a high chance of containing the term in the original language and its translation into Slovene were chosen, by using three CQL patterms in noSketch Engine. These sentences are manually annotated for (1) terms, (2) partial terms and (3) abbreviations in (a) Slovene, (b) English, or (c) other language. Links between the Slovene terms and their equivalents in the other languages, as well as their abbreviations, are encoded as well. The resource can serve as a training set for supervised learning of bilingual term extraction tools and their benchmarking. | 
| dc.language.iso | slv | 
| dc.language.iso | eng | 
| dc.publisher | Jožef Stefan Institute | 
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | 
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ | 
| dc.rights.label | PUB | 
| dc.source.uri | http://nl.ijs.si/kas/ | 
| dc.subject | terminology | 
| dc.subject | manual annotation | 
| dc.title | Bilingual terminology extraction dataset KAS-biterm 1.0 | 
| dc.type | corpus | 
| metashare.ResourceInfo#ContentInfo.mediaType | text | 
| has.files | yes | 
| branding | CLARIN.SI data & tools | 
| demo.uri | https://github.com/clarinsi/kas-biterm | 
| contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute | 
| sponsor | ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds | 
| size.info | 1952 sentences | 
| size.info | 78491 tokens | 
| size.info | 3732 terms | 
| files.count | 2 | 
| files.size | 1914238 | 
Files in this item
Download all files in item (1.83 MB)This item is 
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
 
 
 
Publicly Available
 and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
 
 
 
 
- Name
- KAS-biterm.TEI.zip
- Size
- 1.01 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- ea3f035e5f0c2ac980524622a384f55a
- KAS-biterm.TEI- msd-fslib-sl.xml465 kB
- kas-biterm.xml10 kB
- kas-biterm.body.xml4 MB
- schema- tei_clarin.zip47 kB
- tei_clarin.rnc206 kB
- tei_clarin.dtd167 kB
- tei_clarin_doc.html2 MB
- tei_clarin.rng424 kB
 
- 00README.txt181 B
 
 
- Name
- KAS-biterm-smernice-v1.0.pdf
- Size
- 830.81 KB
- Format
- Description
- Annotation guidelines (in Slovenian)
- MD5
- 529638f68d81ee34133c4bea2f4915fd
