Files in this item
Download all files in item (3.8 MB)This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Name
- README.txt
- Size
- 1.57 KB
- Format
- Text file
- Description
- Unknown
- MD5
- 44e2ee5cc966591d100d2a750bc0fd60
The Vejica corpus is stored as a tab delimited text file with two columns: ID and one sentence. The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of superfluous comma and "¤" (U+00A4) for missing comma. The IDs encode the source of the sampled sentence and start as folows: KUST.de. - corpus KUST, first language German KUST.en. - corpus KUST, first language English KUST.es. - corpus KUST, first language Spanish KUST.it. - corpus KUST, first language Italian KUST.sh. - corpus KUST, first language Croatian, Serbinan or Bosnian Solar.G1. - corpus Šolar, grammar school, 1st grade Solar.G2. - corpus Šolar, grammar school, 2nd grade Solar.G3. - corpus Šolar, grammar school, 3rd grade Solar.G4. - corpus Šolar, grammar school, 4th grade Solar.OS6. - corpus Šolar, primary school, 6th grade Solar.OS7. - corpus Šolar, primary school, 7th grade Solar.OS8. - corpus Šolar, primary school, 8th grade Solar.OS9. - corpus Šolar, primary school, 9th grade Sola . . .

- Name
- vejica13.zip
- Size
- 3.8 MB
- Format
- application/zip
- Description
- Unknown
- MD5
- ddb65e98a7435718f80bb23591e2999d