Prikaži enostavni zapis vnosa
dc.contributor.author |
Erjavec, Tomaž |
dc.contributor.author |
Fišer, Darja |
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Ferme, Marko |
dc.contributor.author |
Borovič, Mladen |
dc.contributor.author |
Boškovič, Borko |
dc.contributor.author |
Ojsteršek, Milan |
dc.contributor.author |
Hrovat, Goran |
dc.date.accessioned |
2019-12-24T14:42:48Z |
dc.date.available |
2019-12-24T14:42:48Z |
dc.date.issued |
2019-11-28 |
dc.identifier.uri |
http://hdl.handle.net/11356/1265 |
dc.description |
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 million tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si).
The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked. Slovene monolingual term candidates are also marked up.
The corpus is distributed in the canonical TEI encoding, in the so called vertical format used by the (no)Sketch Engine and CWB concordancers, and as plain text files. Each format distribution also contains a file with thesis metadata.
This repository entry contains the corpus of PhD theses only; separate entries are available that contain MSc/MA theses (KAS-mag: http://hdl.handle.net/11356/1266), BSc/BA theses (KAS-dipl: http://hdl.handle.net/11356/1267) and the complete KAS corpus with all three (KAS: http://hdl.handle.net/11356/1244). |
dc.language.iso |
slv |
dc.publisher |
Jožef Stefan Institute |
dc.publisher |
Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.relation.isreferencedby |
https://rdcu.be/b7GrB |
dc.rights |
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0 |
dc.rights.uri |
https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0 |
dc.rights.label |
ACA |
dc.source.uri |
http://nl.ijs.si/kas/ |
dc.subject |
PhD theses |
dc.subject |
academic writing |
dc.subject |
terminology |
dc.subject |
TEI |
dc.subject |
scientific texts |
dc.title |
Corpus of Academic Slovene (PhD theses) KAS-dr 1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor |
ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds |
size.info |
1569 texts |
size.info |
266423 pages |
size.info |
101473395 tokens |
files.count |
3 |
files.size |
2706506388 |
featuredService.kontext |
search|https://www.clarin.si/kontext/first_form?corpname=kas_dr |
featuredService.noske |
search|https://www.clarin.si/ske/#dashboard?corpname=kas_dr |
Datoteke v tem vnosu
- Ime
- kasDr.tei.tar.gz
- Velikost
- 1.09
GB
- Format
- application/gzip
- Opis
- Corpus in TEI format
- MD5
- f8a47a10d144fd40bcf8b35bc72c8dc9
Prenesi datoteko
- Ime
- kasDr.vert.tar.gz
- Velikost
- 1.08
GB
- Format
- application/gzip
- Opis
- Corpus in derived vertical format
- MD5
- 328f7b9ff7c7c7184fe1ba074c550785
Prenesi datoteko
- Ime
- kasDr.txt.tar.gz
- Velikost
- 351.73
MB
- Format
- application/gzip
- Opis
- Corpus in plain text format
- MD5
- 4f37b0d4c1436bd841c7ace62e74517a
Prenesi datoteko
Prikaži enostavni zapis vnosa