Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0

Name: Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0
License: https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0

Žitko, Branko; Gašpar, Angelina; Bročić, Lucija; Vasić, Daniel

Show simple item record

dc.contributor.author	Žitko, Branko
dc.contributor.author	Gašpar, Angelina
dc.contributor.author	Bročić, Lucija
dc.contributor.author	Vasić, Daniel
dc.date.accessioned	2023-04-24T12:38:31Z
dc.date.available	2023-04-24T12:38:31Z
dc.date.issued	2023-04-24
dc.identifier.uri	http://hdl.handle.net/11356/1822
dc.description	NL2SH (Natural Language to Semantic Hypergraph) dataset can be used to build and evaluate methods for knowledge extraction and representation based on a semantic hypergraph. Each sentence has natural language annotations and dedicated semantic hyperedge. Majority of the sentences used in this dataset are taken from the following sources: * John Eastwood, Oxford Guide to English Grammar, Oxford University Press, 2002. * Andrew Redford, An Introduction to English Sentence Structure, Cambridge University Press, 2009. * Essential English Grammar, Philip Gucker, Dover Publications, Inc. New York, 1966 Natural language annotations are: * sent_i - id of the sentence * tok_i - id of the token in the sentence * word - token text * space - does space follows the token * lemma - lemma of the token * pos - Universal POS tags (https://universaldependencies.org/u/pos/) * tag - Penn Treebank tags (https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) * dep - ClearNLP depedency labels (https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md) * head - id of the token which is a dependency head of the current token * ner - named entities (https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf) * roleset - roleset of a verb frame (https://propbank.github.io/v3.4.0/frames/) * srl - semantic role labels with IOB annotation (https://verbs.colorado.edu/propbank/EPB-Annotation-Guidelines.pdf) * coref - coreference labels with IOB annotation * synset - WordNet's synsets (https://wordnet.princeton.edu) The annotations for semantic hypergraph elements primarily adhere to the annotation guidelines of the Graphbrain project (https://graphbrain.net/manual/notation.html). However, atom annotations are modified and at the end contains: * label, * type and optional subtype, * type specific atom roles, * type specific additional information, * named entity
dc.language.iso	eng
dc.publisher	Faculty of Science University of Split
dc.rights	CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri	https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label	ACA
dc.source.uri	https://www.acnltutor.net
dc.subject	semantic hypergraph
dc.subject	natural language processing
dc.title	Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
demo.uri	https://github.com/bzitko/nl2sh_repo
contact.person	Branko Žitko bzitko@pmfst.hr Faculty of Science University of Split
sponsor	Office of Naval Research N00014-20-1-2066 Enhancing Adaptive Courseware based on Natural Language Processing nationalFunds
size.info	664 sentences
size.info	6851 tokens
size.info	5968 words
files.count	1
files.size	1096051

Files in this item

This item is

Academic Use

and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0

Name: nl2sh_dataset.txt
Size: 1.05 MB
Format: Text file
Description: Dataset in TXT format
MD5: 8ea669eef7103a307496997db3ae4600

Download file

Show simple item record

Files in this item

Partners

Partners

Repository