dc.contributor.author | Holozan, Peter |
dc.date.accessioned | 2016-03-03T17:32:46Z |
dc.date.available | 2016-03-03T17:32:46Z |
dc.date.issued | 2016-03-03 |
dc.identifier.uri | http://hdl.handle.net/11356/1055 |
dc.description | A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus, https://nl.ijs.si/isjt06/proc/26_Stritar.pdf - Šolar: a corpus of student writing, http://www.slovenscina.eu/korpusi/solar - Lektor: a corpus of proof-reading corrections, http://www.slovenscina.eu/korpusi/lektor - Wikipedija: https://sl.wikipedia.org/wiki/Glavna_stran For Lektor, the comma corrections of proof-readers were used. For other texts, the comma errors were manually marked by Peter Holozan. |
dc.language.iso | slv |
dc.publisher | Amebis, d. o. o., Kamnik |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1185 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://peter.amebis.si/vejica.html |
dc.subject | comma placement |
dc.subject | error annotation |
dc.subject | manual annotation |
dc.title | Corpus of comma placement Vejica 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Peter Holozan peter.holozan@amebis.si Amebis, d. o. o., Kamnik |
size.info | 113309 sentences |
files.count | 2 |
files.size | 4206796 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (4.01 MB)To je vnos
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Ime
- README.txt
- Velikost
- 1.53 KB
- Format
- Besedilna datoteka
- Opis
- Description of the format.
- MD5
- feaf50df36f2595ced352c0d45469ba8
The Vejica corpus is stored as a tab delimited text file with two columns: ID and one sentence. The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of superfluous comma and "¤" (U+00A4) for missing comma. The IDs encode the source of the sampled sentence and start as folows: KUST.de. - corpus KUST, first language German KUST.en. - corpus KUST, first language English KUST.es. - corpus KUST, first language Spanish KUST.it. - corpus KUST, first language Italian KUST.sh. - corpus KUST, first language Croatian, Serbinan or Bosnian Solar.G1. - corpus Šolar, grammar school, 1st grade Solar.G2. - corpus Šolar, grammar school, 2nd grade Solar.G3. - corpus Šolar, grammar school, 3rd grade Solar.G4. - corpus Šolar, grammar school, 4th grade Solar.OS6. - corpus Šolar, primary school, 6th grade Solar.OS7. - corpus Šolar, primary school, 7th grade Solar.OS8. - corpus Šolar, primary school, 8th grade Solar.OS9. - corpus Šolar, primary school, 9th grade Sola . . .

- Ime
- vejica10.zip
- Velikost
- 4.01 MB
- Format
- application/zip
- Opis
- Tab delimited text file with two columns: ID and sentence.
- MD5
- db6b1a854660fe1f142f80e56e10e250