Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (4.01 MB)
Icon
Ime
README.txt
Velikost
1.53 KB
Format
Besedilna datoteka
Opis
Description of the format.
MD5
feaf50df36f2595ced352c0d45469ba8
 Prenesi datoteko  Predogled
 Predogled datoteke  
The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .
                                            
Icon
Ime
vejica10.zip
Velikost
4.01 MB
Format
application/zip
Opis
Tab delimited text file with two columns: ID and sentence.
MD5
db6b1a854660fe1f142f80e56e10e250
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • vejica10.txt11 MB