Prikaži enostavni zapis vnosa

 
dc.contributor.author Holozan, Peter
dc.date.accessioned 2016-03-03T17:32:46Z
dc.date.available 2016-03-03T17:32:46Z
dc.date.issued 2016-03-03
dc.identifier.uri http://hdl.handle.net/11356/1055
dc.description A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus, https://nl.ijs.si/isjt06/proc/26_Stritar.pdf - Šolar: a corpus of student writing, http://www.slovenscina.eu/korpusi/solar - Lektor: a corpus of proof-reading corrections, http://www.slovenscina.eu/korpusi/lektor - Wikipedija: https://sl.wikipedia.org/wiki/Glavna_stran For Lektor, the comma corrections of proof-readers were used. For other texts, the comma errors were manually marked by Peter Holozan.
dc.language.iso slv
dc.publisher Amebis, d. o. o., Kamnik
dc.relation.isreplacedby http://hdl.handle.net/11356/1185
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri http://peter.amebis.si/vejica.html
dc.subject comma placement
dc.subject error annotation
dc.subject manual annotation
dc.title Corpus of comma placement Vejica 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Peter Holozan peter.holozan@amebis.si Amebis, d. o. o., Kamnik
size.info 113309 sentences
files.count 2
files.size 4206796


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (4.01 MB)
Icon
Ime
README.txt
Velikost
1.53 KB
Format
Besedilna datoteka
Opis
Description of the format.
MD5
feaf50df36f2595ced352c0d45469ba8
 Prenesi datoteko  Predogled
 Predogled datoteke  
The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .
                                            
Icon
Ime
vejica10.zip
Velikost
4.01 MB
Format
application/zip
Opis
Tab delimited text file with two columns: ID and sentence.
MD5
db6b1a854660fe1f142f80e56e10e250
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • vejica10.txt11 MB

Prikaži enostavni zapis vnosa