Prikaži enostavni zapis vnosa

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Pori, Eva
dc.contributor.author Arhar Holdt, Špela
dc.date.accessioned 2023-03-01T12:10:03Z
dc.date.available 2023-03-01T12:10:03Z
dc.date.issued 2023-02-28
dc.identifier.uri http://hdl.handle.net/11356/1719
dc.description The dataset contains a list of 11906 words (lemmas with part of speech information) and their frequency of occurrence in a corpus of Slovenian textobooks, covering elementary school (Grade 1 to 9) and secondary school (Year 1 to 4). The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different subjects. The distribution per school level is as follows: - Grade 1: 17949 tokens - Grade 2: 46317 tokens - Grade 3: 84222 tokens - Grade 4: 305454 tokens - Grade 5: 357400 tokens - Grade 6: 351463 tokens - Grade 7: 537359 tokens - Grade 8: 592068 tokens - Grade 9: 765574 tokens - Year 1: 665093 tokens - Year 2: 200267 tokens - Year 3: 149442 tokens - Year 4: 23406 tokens - Year 1-4: 206843 tokens (these are textbooks that are used in all the years of secondary school and were not divided according to different years) The purpose of the dataset is to facilitate research into vocabularly use at different levels of education, and to enable comparative studies of student language reception and production in Slovene.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/prop/en/
dc.subject textbook corpus
dc.subject vocabulary
dc.subject diachronic
dc.subject school
dc.subject language didactics
dc.title Frequency list of textbook vocabulary by level of education in elementary and secondary schools
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Iztok Kosem iztok.kosem@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 11906 words
files.count 2
files.size 1447177


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (1.38 MB)
Icon
Ime
frequency-list-from-textbook-corpus-diachronic.txt
Velikost
1.38 MB
Format
Besedilna datoteka
Opis
Frequency list in text format
MD5
9fac168c226aee97d6e1e0251f528c5a
 Prenesi datoteko  Predogled
 Predogled datoteke  
Lema	Lema (male črke)	Besedna vrsta	Skupna absolutna pogostost leme	Skupna relativna pogostost (na milijon pojavitev)	Absolutna pogostost (1. razred)	Relativna pogostost (1. razred)	Absolutna pogostost (2. razred)	Relativna pogostost (2. razred)	Absolutna pogostost (3. razred)	Relativna pogostost (3. razred)	Absolutna pogostost (4. razred)	Relativna pogostost (4. razred)	Absolutna pogostost (5. razred)	Relativna pogostost (5. razred)	Absolutna pogostost (6. razred)	Relativna pogostost (6. razred)	Absolutna pogostost (7. razred)	Relativna pogostost (7. razred)	Absolutna pogostost (8. razred)	Relativna pogostost (8. razred)	Absolutna pogostost (9. razred)	Relativna pogostost (9. razred)	Absolutna pogostost (1. letnik)	Relativna pogostost (1. letnik)	Absolutna pogostost (2. letnik)	Relativna pogostost (2. letnik)	Absolutna pogostost (3. letnik)	Relativna pogostost (3. letnik)	Absolutna pogostost (4. letnik)	Relativna pogostost (4. letnik)	Absolutna pogostost (1.-4. letnik)	Relativna pogos . . .
                                            
Icon
Ime
README.txt
Velikost
1.84 KB
Format
Besedilna datoteka
Opis
README file
MD5
8df9248e571f9ce83993f73ca8175d94
 Prenesi datoteko  Predogled
 Predogled datoteke  
***************

SLO: Podatkovni niz vsebuje seznam 11.906 besed (s podatkom o besedni vrsti) in njihove pogostosti v učbeniškem korpusu, ki vsebuje učbenike iz osnovne šole (od 1. do 9. razreda) in srednje šole (od 1. do 4. letnika).
ENG: The dataset contains a list of 11906 words (lemmas with part of speech information) and their frequency of occurrence in a corpus of Slovenian textbooks, covering elementary school (Grade 1 to 9) and secondary school (Year 1 to 4).

Kosem, Iztok; Pori, Eva; Arhar Holdt, Špela, 2023, Frequency list of textbook vocabulary by level of education in elementary and secondary schools, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1719.

***************

"Lema": SLO: Lema besede iz učbeniškega korpusa. ENG: Lemma of the word from the textbook corpus.

"Lema (male črke)": SLO: Lema besede z malimi črkami. ENG: Lemma of the word in lower case.

"Besedna vrsta": SLO: Podatek o besedni vrsti besede (po . . .
                                            

Prikaži enostavni zapis vnosa