Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (3.8 MB)
Icon
Ime
README.txt
Velikost
1.57 KB
Format
Besedilna datoteka
Opis
Unknown
MD5
44e2ee5cc966591d100d2a750bc0fd60
 Prenesi datoteko  Predogled
 Predogled datoteke  
The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .
                                            
Icon
Ime
vejica13.zip
Velikost
3.8 MB
Format
application/zip
Opis
Unknown
MD5
ddb65e98a7435718f80bb23591e2999d
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • vejica13.txt11 MB