Prikaži enostavni zapis vnosa
dc.contributor.author |
Erjavec, Tomaž |
dc.contributor.author |
Fišer, Darja |
dc.contributor.author |
Ljubešić, Nikola |
dc.contributor.author |
Ferme, Marko |
dc.contributor.author |
Borovič, Mladen |
dc.contributor.author |
Boškovič, Borko |
dc.contributor.author |
Ojsteršek, Milan |
dc.contributor.author |
Hrovat, Goran |
dc.date.accessioned |
2019-12-24T14:41:38Z |
dc.date.available |
2019-12-24T14:41:38Z |
dc.date.issued |
2019-11-28 |
dc.identifier.uri |
http://hdl.handle.net/11356/1267 |
dc.description |
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1,1 billion tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si).
The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked.
The corpus is distributed in the canonical TEI encoding, in the so called vertical format used by the (no)Sketch Engine and CWB concordancers, and as plain text files. Each format distribution also contains a file with thesis metadata.
This repository entry contains the corpus of BSc/BA theses only; separate entries are available that contain PhD theses (KAS-dr: http://hdl.handle.net/11356/1265), MSc/MA theses (KAS-mag: http://hdl.handle.net/11356/1266) and the complete KAS corpus with all three (KAS: http://hdl.handle.net/11356/1244). |
dc.language.iso |
slv |
dc.publisher |
Jožef Stefan Institute |
dc.publisher |
Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.relation.isreferencedby |
https://rdcu.be/b7GrB |
dc.rights |
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0 |
dc.rights.uri |
https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0 |
dc.rights.label |
ACA |
dc.source.uri |
http://nl.ijs.si/kas/ |
dc.subject |
BSc/BA theses |
dc.subject |
academic writing |
dc.subject |
terminology |
dc.subject |
TEI |
dc.title |
Corpus of Academic Slovene (BSc/BA theses) KAS-dipl 1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor |
ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds |
size.info |
64743 texts |
size.info |
3420522 pages |
size.info |
1101796659 tokens |
files.count |
5 |
files.size |
29663567994 |
featuredService.kontext |
search|https://www.clarin.si/kontext/first_form?corpname=kas_dipl |
featuredService.noske |
search|https://www.clarin.si/ske/#dashboard?corpname=kas_dipl&struct_attr_stats=1&subcorpora=1 |
Datoteke v tem vnosu
- Ime
- kasDipl.tei.tar.0.gz
- Velikost
- 7.29
GB
- Format
- application/gzip
- Opis
- Corpus in TEI format, slice 0
- MD5
- c64d9710aebd8f3e0b48e3bf85d42d70
Prenesi datoteko
- Ime
- kasDipl.tei.tar.1.gz
- Velikost
- 4.86
GB
- Format
- application/gzip
- Opis
- Corpus in TEI format, slice 1
- MD5
- de4a326ab752a0c6871c50ac11ae23b4
Prenesi datoteko
- Ime
- kasDipl.vert.tar.0.gz
- Velikost
- 9.18
GB
- Format
- application/gzip
- Opis
- Corpus in derived vertical format, slice 0
- MD5
- 14e14bf98a6b203686ea822547d3958d
Prenesi datoteko
- Ime
- kasDipl.vert.tar.1.gz
- Velikost
- 2.29
GB
- Format
- application/gzip
- Opis
- Corpus in derived vertical format, slice 1
- MD5
- a1f7b0e67983372f0452c35651354cfe
Prenesi datoteko
- Ime
- kasDipl.txt.tar.gz
- Velikost
- 4.01
GB
- Format
- application/gzip
- Opis
- Corpus in plain text format
- MD5
- 82d1e99e0501a655ed6d2ed3b72a06ba
Prenesi datoteko
Prikaži enostavni zapis vnosa