Files in this item

 Download all files in item (3.8 MB)
Icon
Name
README.txt
Size
1.57 KB
Format
Text file
Description
Unknown
MD5
44e2ee5cc966591d100d2a750bc0fd60
 Download file  Preview
 File Preview  
The Vejica corpus is stored as a tab delimited text file with two columns: ID and one sentence. The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of superfluous comma and "¤" (U+00A4) for missing comma. The IDs encode the source of the sampled sentence and start as folows: KUST.de. - corpus KUST, first language German KUST.en. - corpus KUST, first language English KUST.es. - corpus KUST, first language Spanish KUST.it. - corpus KUST, first language Italian KUST.sh. - corpus KUST, first language Croatian, Serbinan or Bosnian Solar.G1. - corpus Šolar, grammar school, 1st grade Solar.G2. - corpus Šolar, grammar school, 2nd grade Solar.G3. - corpus Šolar, grammar school, 3rd grade Solar.G4. - corpus Šolar, grammar school, 4th grade Solar.OS6. - corpus Šolar, primary school, 6th grade Solar.OS7. - corpus Šolar, primary school, 7th grade Solar.OS8. - corpus Šolar, primary school, 8th grade Solar.OS9. - corpus Šolar, primary school, 9th grade Sola . . .
Icon
Name
vejica13.zip
Size
3.8 MB
Format
application/zip
Description
Unknown
MD5
ddb65e98a7435718f80bb23591e2999d
 Download file  Preview
 File Preview  
    • vejica13.txt11 MB