dc.contributor.author | Wasserscheidt, Philipp |
dc.contributor.author | Bulić, Halid |
dc.contributor.author | Durmišević, Elma |
dc.contributor.author | Hodžić-Čavkić, Azra |
dc.contributor.author | Bajraktarević, Enisa |
dc.contributor.author | Ahmetspahić-Peljto, Azra |
dc.contributor.author | Šabić, Belmin |
dc.date.accessioned | 2024-04-18T09:52:51Z |
dc.date.available | 2024-04-18T09:52:51Z |
dc.date.issued | 2024-04-17 |
dc.identifier.uri | http://hdl.handle.net/11356/1913 |
dc.description | This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022. The SMS messages included in this corpus were obtained from voluntary donors (informants). Both senders and recipients of the messages included in the corpus are Bosnian speakers, exhibiting diversity in terms of age, education and occupation, place of origin and countries of long-term residence. The Sarajevo Corpus of SMS Messages in Bosnian was originally published by University of Sarajevo – Faculty of Philosophy as an electronic book. The second phase of the work involved compiling the SMS messages into a corpus and linguistic annotation, which was done using the CLASSLA package (https://github.com/clarinsi/classla), version 2.1, with language = Serbian and type = nonstandard for tokenization, lemmatization and morpho-syntactic tagging (both MULTEXT-East and Universal Dependencies). |
dc.language.iso | bos |
dc.publisher | University of Sarajevo – Faculty of Philosophy |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1956 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.ff.unsa.ba/index.php/bs/projekti-centra-za-b-h-s-jezik/18335-sarajevski-korpus-sms-poruka-na-bosanskom-jeziku |
dc.subject | SMS |
dc.subject | specialised corpus |
dc.title | The Sarajevo Corpus of SMS Messages in Bosnian 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Philipp Wasserscheidt philipp.wasserscheidt@hu-berlin.de Humboldt-Universität zu Berlin |
contact.person | Halid Bulić halid.bulic@ff.unsa.ba University of Sarajevo |
size.info | 10000 texts |
size.info | 15330 sentences |
size.info | 122843 tokens |
files.count | 1 |
files.size | 1770084 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- SCSMS.zip
- Size
- 1.69 MB
- Format
- application/zip
- Description
- Corpus in CoNLL-U format
- MD5
- ef2e8f9b358161817e238aa4bc927876
- SCSMS
- 096.conllu146 kB
- 049.conllu98 kB
- 076.conllu121 kB
- 029.conllu135 kB
- 056.conllu122 kB
- 058.conllu151 kB
- 083.conllu125 kB
- 085.conllu121 kB
- 038.conllu132 kB
- 065.conllu100 kB
- 092.conllu110 kB
- 018.conllu135 kB
- 045.conllu120 kB
- 047.conllu115 kB
- 072.conllu104 kB
- 074.conllu96 kB
- 027.conllu120 kB
- 054.conllu150 kB
- 081.conllu143 kB
- 007.conllu119 kB
- 009.conllu155 kB
- 034.conllu123 kB
- 036.conllu124 kB
- 061.conllu117 kB
- 063.conllu100 kB
- 090.conllu96 kB
- 016.conllu111 kB
- 043.conllu114 kB
- 070.conllu95 kB
- 023.conllu125 kB
- 025.conllu126 kB
- 050.conllu93 kB
- 052.conllu117 kB
- 005.conllu128 kB
- 030.conllu132 kB
- 032.conllu116 kB
- 012.conllu117 kB
- 014.conllu121 kB
- 099.conllu156 kB
- 041.conllu118 kB
- 021.conllu141 kB
- 001.conllu150 kB
- 003.conllu133 kB
- 088.conllu134 kB
- 010.conllu106 kB
- 095.conllu142 kB
- 097.conllu141 kB
- 100.conllu161 kB
- 077.conllu150 kB
- 079.conllu136 kB
- 057.conllu123 kB
- 059.conllu119 kB
- 084.conllu130 kB
- 086.conllu121 kB
- 039.conllu126 kB
- 066.conllu91 kB
- 068.conllu105 kB
- 093.conllu110 kB
- 019.conllu130 kB
- 046.conllu122 kB
- 048.conllu110 kB
- 073.conllu111 kB
- 075.conllu102 kB
- 028.conllu126 kB
- 055.conllu161 kB
- 082.conllu131 kB
- 008.conllu121 kB
- 035.conllu103 kB
- 037.conllu119 kB
- 062.conllu101 kB
- 064.conllu96 kB
- 091.conllu149 kB
- 017.conllu129 kB
- 044.conllu134 kB
- 071.conllu109 kB
- 024.conllu133 kB
- 026.conllu115 kB
- 051.conllu118 kB
- 053.conllu119 kB
- 080.conllu125 kB
- 006.conllu103 kB
- 033.conllu138 kB
- 060.conllu114 kB
- 013.conllu126 kB
- 015.conllu128 kB
- 040.conllu137 kB
- 042.conllu107 kB
- 022.conllu128 kB
- 002.conllu129 kB
- 004.conllu122 kB
- 089.conllu121 kB
- 031.conllu140 kB
- 011.conllu117 kB
- 098.conllu156 kB
- 078.conllu146 kB
- 020.conllu125 kB
- 087.conllu120 kB
- 067.conllu91 kB
- 069.conllu93 kB
- 094.conllu147 kB