dc.contributor.author | Žitnik, Slavko |
dc.date.accessioned | 2018-03-23T18:56:26Z |
dc.date.available | 2018-03-23T18:56:26Z |
dc.date.issued | 2018-03-19 |
dc.identifier.uri | http://hdl.handle.net/11356/1182 |
dc.description | This corpus contains a subset of the ssj500k v1.4 corpus, http://hdl.handle.net/11356/1052. Each of 149 documents contains a paragraph from ssj500k that contains at least 100 words and at least 6 named entities. The data is in TCF format, exported from the WebAnno tool, https://webanno.github.io/webanno/. The annotated entities are of type person, organization or location. Mentions are annotated as coreference chains without additional classifications of different coreference types. Annotations also include implicit mentions that are specific for the Slovene language - in this case, a verb is tagged. The corpus consists of 1277 entities, 2329 mentions, 831 singleton entities, 40 appositions and 215 overlapping mentions. We also annotated overlapping mentions of the same entity - for example in text [strokovnega direktorja KC [Zorana Arneža]] we annotate two overlapping mentions that refer to the same entity. There are 97 such mentions in the corpus. In the public source code repository https://bitbucket.org/szitnik/nutie-core class TEIP5Importer contains an additional function to read the dataset and merge it together with the ssj500k dataset. |
dc.language.iso | slv |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.subject | coreference resolution |
dc.title | Slovene coreference resolution corpus coref149 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://bitbucket.org/szitnik/nutie-web |
contact.person | Slavko Žitnik slavko.zitnik@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
size.info | 1060 sentences |
size.info | 26960 tokens |
size.info | 149 files |
files.count | 1 |
files.size | 463706 |
Files in this item
This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Name
- coref149_v1.0.zip
- Size
- 452.84 KB
- Format
- application/zip
- Description
- Data in TCF format. The script for parsing and merging with the ssj500k v1.4 data can be found in class TEIP5Importer in the repository https://bitbucket.org/szitnik/nutie-core.
- MD5
- bfea4b3dda6965dd97069df0c2517ee9
- __MACOSX
- ._ssj172.1176.tcf-1 B
- ._ssj13.60.tcf-1 B
- ._ssj132.845.tcf-1 B
- ._ssj53.330.tcf-1 B
- ._ssj233.1540.tcf-1 B
- ._ssj133.850.tcf-1 B
- ._ssj144.1003.tcf-1 B
- ._ssj133.868.tcf-1 B
- ._ssj101.650.tcf-1 B
- ._ssj233.1547.tcf-1 B
- ._ssj237.1563.tcf-1 B
- ._ssj44.267.tcf-1 B
- ._ssj124.787.tcf-1 B
- ._ssj162.1117.tcf-1 B
- ._ssj42.250.tcf-1 B
- ._ssj63.392.tcf-1 B
- ._ssj130.824.tcf-1 B
- ._ssj73.459.tcf-1 B
- ._ssj242.1583.tcf-1 B
- ._ssj153.1046.tcf-1 B
- ._ssj24.115.tcf-1 B
- ._ssj212.1402.tcf-1 B
- ._ssj132.841.tcf-1 B
- ._ssj75.476.tcf-1 B
- ._ssj98.629.tcf-1 B
- ._ssj144.998.tcf-1 B
- ._ssj226.1500.tcf-1 B
- ._ssj136.882.tcf-1 B
- ._ssj187.1237.tcf-1 B
- ._ssj178.1201.tcf-1 B
- ._ssj85.558.tcf-1 B
- ._ssj44.263.tcf-1 B
- ._ssj201.1356.tcf-1 B
- ._ssj166.1140.tcf-1 B
- ._ssj226.1491.tcf-1 B
- ._ssj38.227.tcf-1 B
- ._ssj213.1413.tcf-1 B
- ._ssj152.1038.tcf-1 B
- ._ssj45.275.tcf-1 B
- ._ssj121.779.tcf-1 B
- ._ssj194.1307.tcf-1 B
- ._ssj12.47.tcf-1 B
- ._ssj40.240.tcf-1 B
- ._ssj142.915.tcf-1 B
- ._ssj206.1383.tcf-1 B
- ._ssj217.1440.tcf-1 B
- ._ssj138.884.tcf-1 B
- ._ssj86.570.tcf-1 B
- ._ssj118.765.tcf-1 B
- ._ssj171.1167.tcf-1 B
- ._ssj42.249.tcf-1 B
- ._ssj223.1464.tcf-1 B
- ._ssj226.1503.tcf-1 B
- ._ssj53.336.tcf-1 B
- ._ssj220.1452.tcf-1 B
- ._ssj18.86.tcf-1 B
- ._ssj28.170.tcf-1 B
- ._ssj18.75.tcf-1 B
- ._ssj211.1399.tcf-1 B
- ._ssj78.513.tcf-1 B
- ._ssj230.1530.tcf-1 B
- ._ssj81.527.tcf-1 B
- ._ssj102.657.tcf-1 B
- ._ssj144.990.tcf-1 B
- ._ssj206.1386.tcf-1 B
- ._ssj161.1101.tcf-1 B
- ._ssj204.1367.tcf-1 B
- ._ssj212.1408.tcf-1 B
- ._ssj119.773.tcf-1 B
- ._ssj132.847.tcf-1 B
- ._ssj59.361.tcf-1 B
- ._ssj12.50.tcf-1 B
- ._ssj4.15.tcf-1 B
- ._ssj186.1232.tcf-1 B
- ._ssj139.888.tcf-1 B
- ._ssj124.789.tcf-1 B
- ._ssj247.1598.tcf-1 B
- ._ssj80.518.tcf-1 B
- ._ssj101.648.tcf-1 B
- ._ssj100.640.tcf-1 B
- ._ssj46.286.tcf-1 B
- ._ssj14.63.tcf-1 B
- ._ssj193.1302.tcf-1 B
- ._ssj15.64.tcf-1 B
- ._ssj226.1475.tcf-1 B
- ._ssj106.676.tcf-1 B
- ._ssj165.1131.tcf-1 B
- ._ssj156.1064.tcf-1 B
- ._ssj206.1389.tcf-1 B
- ._ssj111.701.tcf-1 B
- ._ssj133.866.tcf-1 B
- ._ssj144.1001.tcf-1 B
- ._ssj79.517.tcf-1 B
- ._ssj192.1256.tcf-1 B
- ._ssj220.1451.tcf-1 B
- ._ssj68.423.tcf-1 B
- ._ssj166.1142.tcf-1 B
- ._ssj15.71.tcf-1 B
- ._ssj201.1347.tcf-1 B
- ._ssj51.325.tcf-1 B
- ._ssj172.1181.tcf-1 B
- ._ssj166.1138.tcf-1 B
- ._ssj226.1489.tcf-1 B
- ._ssj5.30.tcf-1 B
- ._ssj238.1572.tcf-1 B
- ._ssj28.165.tcf-1 B
- ._ssj76.490.tcf-1 B
- ._ssj69.431.tcf-1 B
- ._ssj217.1442.tcf-1 B
- ._ssj156.1067.tcf-1 B
- ._ssj214.1415.tcf-1 B
- ._ssj52.326.tcf-1 B
- ._ssj53.331.tcf-1 B
- ._ssj35.207.tcf-1 B
- ._ssj18.81.tcf-1 B
- ._ssj178.1206.tcf-1 B
- ._ssj228.1513.tcf-1 B
- ._ssj237.1564.tcf-1 B
- ._ssj24.120.tcf-1 B
- ._ssj234.1552.tcf-1 B
- ._ssj63.393.tcf-1 B
- ._ssj134.877.tcf-1 B
- ._ssj177.1193.tcf-1 B
- ._ssj242.1584.tcf-1 B
- ._ssj93.617.tcf-1 B
- ._ssj76.493.tcf-1 B
- ._ssj42.247.tcf-1 B
- ._ssj69.434.tcf-1 B
- ._ssj151.1028.tcf-1 B
- ._ssj63.389.tcf-1 B
- ._ssj54.350.tcf-1 B
- ._ssj79.516.tcf-1 B
- ._ssj214.1418.tcf-1 B
- ._ssj156.1059.tcf-1 B
- ._ssj44.264.tcf-1 B
- ._ssj68.422.tcf-1 B
- ._ssj111.698.tcf-1 B
- ._ssj213.1414.tcf-1 B
- ._ssj82.537.tcf-1 B
- ._ssj130.821.tcf-1 B
- ._ssj198.1324.tcf-1 B
- ._ssj234.1555.tcf-1 B
- ._ssj161.1110.tcf-1 B
- ._ssj246.1596.tcf-1 B
- ._ssj206.1384.tcf-1 B
- ._ssj226.1477.tcf-1 B
- ._ssj47.316.tcf-1 B
- ._ssj81.532.tcf-1 B
- ._ssj114.714.tcf-1 B
- ssj156.1067.tcf-1 B
- ssj28.170.tcf-1 B
- ssj5.30.tcf-1 B
- ssj178.1206.tcf-1 B
- ssj78.513.tcf-1 B
- ssj166.1138.tcf-1 B
- ssj226.1489.tcf-1 B
- ssj237.1564.tcf-1 B
- ssj234.1552.tcf-1 B
- ssj214.1415.tcf-1 B
- ssj46.286.tcf-1 B
- ssj177.1193.tcf-1 B
- ssj59.361.tcf-1 B
- ssj228.1513.tcf-1 B
- ssj18.81.tcf-1 B
- ssj156.1059.tcf-1 B
- ssj80.518.tcf-1 B
- ssj134.877.tcf-1 B
- ssj242.1584.tcf-1 B
- ssj198.1324.tcf-1 B
- ssj151.1028.tcf-1 B
- ssj234.1555.tcf-1 B
- ssj161.1110.tcf-1 B
- ssj214.1418.tcf-1 B
- ssj79.517.tcf-1 B
- ssj246.1596.tcf-1 B
- ssj68.423.tcf-1 B
- ssj213.1414.tcf-1 B
- ssj172.1176.tcf-1 B
- ssj111.698.tcf-1 B
- ssj114.714.tcf-1 B
- ssj233.1540.tcf-1 B
- ssj132.845.tcf-1 B
- ssj144.1003.tcf-1 B
- ssj133.850.tcf-1 B
- ssj133.868.tcf-1 B
- ssj233.1547.tcf-1 B
- ssj130.821.tcf-1 B
- ssj51.325.tcf-1 B
- ssj237.1563.tcf-1 B
- ssj28.165.tcf-1 B
- ssj206.1384.tcf-1 B
- ssj124.787.tcf-1 B
- ssj76.490.tcf-1 B
- ssj226.1477.tcf-1 B
- ssj69.431.tcf-1 B
- ssj162.1117.tcf-1 B
- ssj52.326.tcf-1 B
- ssj13.60.tcf-1 B
- ssj53.331.tcf-1 B
- ssj35.207.tcf-1 B
- ssj101.650.tcf-1 B
- ssj132.841.tcf-1 B
- ssj54.350.tcf-1 B
- ssj226.1500.tcf-1 B
- ssj178.1201.tcf-1 B
- ssj24.120.tcf-1 B
- ssj201.1356.tcf-1 B
- ssj63.393.tcf-1 B
- ssj130.824.tcf-1 B
- ssj166.1140.tcf-1 B
- ssj242.1583.tcf-1 B
- ssj153.1046.tcf-1 B
- ssj212.1402.tcf-1 B
- ssj93.617.tcf-1 B
- ssj194.1307.tcf-1 B
- ssj76.493.tcf-1 B
- ssj69.434.tcf-1 B
- ssj42.247.tcf-1 B
- ssj63.389.tcf-1 B
- ssj12.47.tcf-1 B
- ssj187.1237.tcf-1 B
- ssj144.998.tcf-1 B
- ssj79.516.tcf-1 B
- ssj136.882.tcf-1 B
- ssj44.264.tcf-1 B
- ssj217.1440.tcf-1 B
- ssj68.422.tcf-1 B
- ssj226.1491.tcf-1 B
- ssj213.1413.tcf-1 B
- ssj82.537.tcf-1 B
- ssj138.884.tcf-1 B
- ssj152.1038.tcf-1 B
- ssj121.779.tcf-1 B
- ssj226.1503.tcf-1 B
- ssj220.1452.tcf-1 B
- ssj206.1383.tcf-1 B
- ssj142.915.tcf-1 B
- ssj81.532.tcf-1 B
- ssj47.316.tcf-1 B
- ssj171.1167.tcf-1 B
- ssj118.765.tcf-1 B
- ssj223.1464.tcf-1 B
- ssj53.330.tcf-1 B
- ssj161.1101.tcf-1 B
- ssj44.267.tcf-1 B
- ssj204.1367.tcf-1 B
- ssj18.86.tcf-1 B
- ssj132.847.tcf-1 B
- ssj42.250.tcf-1 B
- ssj18.75.tcf-1 B
- ssj211.1399.tcf-1 B
- ssj63.392.tcf-1 B
- ssj12.50.tcf-1 B
- ssj73.459.tcf-1 B
- ssj230.1530.tcf-1 B
- ssj139.888.tcf-1 B
- ssj24.115.tcf-1 B
- ssj206.1386.tcf-1 B
- ssj102.657.tcf-1 B
- ssj124.789.tcf-1 B
- ssj144.990.tcf-1 B
- ssj247.1598.tcf-1 B
- ssj75.476.tcf-1 B
- ssj98.629.tcf-1 B
- ssj40.240.tcf-1 B
- ssj212.1408.tcf-1 B
- ssj4.15.tcf-1 B
- ssj193.1302.tcf-1 B
- ssj100.640.tcf-1 B
- ssj14.63.tcf-1 B
- ssj119.773.tcf-1 B
- ssj85.558.tcf-1 B
- ssj15.64.tcf-1 B
- ssj44.263.tcf-1 B
- ssj165.1131.tcf-1 B
- ssj156.1064.tcf-1 B
- ssj186.1232.tcf-1 B
- ssj38.227.tcf-1 B
- ssj106.676.tcf-1 B
- ssj45.275.tcf-1 B
- ssj144.1001.tcf-1 B
- ssj133.866.tcf-1 B
- ssj192.1256.tcf-1 B
- ssj220.1451.tcf-1 B
- ssj101.648.tcf-1 B
- ssj166.1142.tcf-1 B
- ssj226.1475.tcf-1 B
- ssj201.1347.tcf-1 B
- ssj15.71.tcf-1 B
- ssj86.570.tcf-1 B
- ssj206.1389.tcf-1 B
- ssj42.249.tcf-1 B
- ssj111.701.tcf-1 B
- ssj172.1181.tcf-1 B
- ssj238.1572.tcf-1 B
- ssj81.527.tcf-1 B
- ssj53.336.tcf-1 B
- ssj217.1442.tcf-1 B