Show simple item record

 
dc.contributor.author Holozan, Peter
dc.date.accessioned 2016-03-03T17:32:46Z
dc.date.available 2016-03-03T17:32:46Z
dc.date.issued 2016-03-03
dc.identifier.uri http://hdl.handle.net/11356/1055
dc.description A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus, https://nl.ijs.si/isjt06/proc/26_Stritar.pdf - Šolar: a corpus of student writing, http://www.slovenscina.eu/korpusi/solar - Lektor: a corpus of proof-reading corrections, http://www.slovenscina.eu/korpusi/lektor - Wikipedija: https://sl.wikipedia.org/wiki/Glavna_stran For Lektor, the comma corrections of proof-readers were used. For other texts, the comma errors were manually marked by Peter Holozan.
dc.language.iso slv
dc.publisher Amebis, d. o. o., Kamnik
dc.relation.isreplacedby http://hdl.handle.net/11356/1185
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri http://peter.amebis.si/vejica.html
dc.subject comma placement
dc.subject error annotation
dc.subject manual annotation
dc.title Corpus of comma placement Vejica 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Peter Holozan peter.holozan@amebis.si Amebis, d. o. o., Kamnik
size.info 113309 sentences
files.count 2
files.size 4206796


 Files in this item

 Download all files in item (4.01 MB)
Icon
Name
README.txt
Size
1.53 KB
Format
Text file
Description
Description of the format.
MD5
feaf50df36f2595ced352c0d45469ba8
 Download file  Preview
 File Preview  
The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .
                                            
Icon
Name
vejica10.zip
Size
4.01 MB
Format
application/zip
Description
Tab delimited text file with two columns: ID and sentence.
MD5
db6b1a854660fe1f142f80e56e10e250
 Download file  Preview
 File Preview  
    • vejica10.txt11 MB

Show simple item record