Show simple item record Žitko, Branko Gašpar, Angelina Bročić, Lucija Vasić, Daniel 2023-04-24T12:38:31Z 2023-04-24T12:38:31Z 2023-04-24
dc.description NL2SH (Natural Language to Semantic Hypergraph) dataset can be used to build and evaluate methods for knowledge extraction and representation based on a semantic hypergraph. Each sentence has natural language annotations and dedicated semantic hyperedge. Majority of the sentences used in this dataset are taken from the following sources: * John Eastwood, Oxford Guide to English Grammar, Oxford University Press, 2002. * Andrew Redford, An Introduction to English Sentence Structure, Cambridge University Press, 2009. * Essential English Grammar, Philip Gucker, Dover Publications, Inc. New York, 1966 Natural language annotations are: * sent_i - id of the sentence * tok_i - id of the token in the sentence * word - token text * space - does space follows the token * lemma - lemma of the token * pos - Universal POS tags ( * tag - Penn Treebank tags ( * dep - ClearNLP depedency labels ( * head - id of the token which is a dependency head of the current token * ner - named entities ( * roleset - roleset of a verb frame ( * srl - semantic role labels with IOB annotation ( * coref - coreference labels with IOB annotation * synset - WordNet's synsets ( The annotations for semantic hypergraph elements primarily adhere to the annotation guidelines of the Graphbrain project ( However, atom annotations are modified and at the end contains: * label, * type and optional subtype, * type specific atom roles, * type specific additional information, * named entity
dc.language.iso eng
dc.publisher Faculty of Science University of Split
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.label ACA
dc.subject semantic hypergraph
dc.subject natural language processing
dc.title Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Branko Žitko Faculty of Science University of Split
sponsor Office of Naval Research N00014-20-1-2066 Enhancing Adaptive Courseware based on Natural Language Processing nationalFunds 664 sentences 6851 tokens 5968 words
files.count 1
files.size 1096051

 Files in this item

This item is
Academic Use
and licensed under:
Inform Before Use Attribution Required Noncommercial
1.05 MB
Text file
Dataset in TXT format
 Download file

Show simple item record