Show simple item record

 
dc.contributor.author Žitko, Branko
dc.contributor.author Gašpar, Angelina
dc.contributor.author Bročić, Lucija
dc.contributor.author Vasić, Daniel
dc.date.accessioned 2023-04-24T12:38:31Z
dc.date.available 2023-04-24T12:38:31Z
dc.date.issued 2023-04-24
dc.identifier.uri http://hdl.handle.net/11356/1822
dc.description NL2SH (Natural Language to Semantic Hypergraph) dataset can be used to build and evaluate methods for knowledge extraction and representation based on a semantic hypergraph. Each sentence has natural language annotations and dedicated semantic hyperedge. Majority of the sentences used in this dataset are taken from the following sources: * John Eastwood, Oxford Guide to English Grammar, Oxford University Press, 2002. * Andrew Redford, An Introduction to English Sentence Structure, Cambridge University Press, 2009. * Essential English Grammar, Philip Gucker, Dover Publications, Inc. New York, 1966 Natural language annotations are: * sent_i - id of the sentence * tok_i - id of the token in the sentence * word - token text * space - does space follows the token * lemma - lemma of the token * pos - Universal POS tags (https://universaldependencies.org/u/pos/) * tag - Penn Treebank tags (https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) * dep - ClearNLP depedency labels (https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md) * head - id of the token which is a dependency head of the current token * ner - named entities (https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf) * roleset - roleset of a verb frame (https://propbank.github.io/v3.4.0/frames/) * srl - semantic role labels with IOB annotation (https://verbs.colorado.edu/propbank/EPB-Annotation-Guidelines.pdf) * coref - coreference labels with IOB annotation * synset - WordNet's synsets (https://wordnet.princeton.edu) The annotations for semantic hypergraph elements primarily adhere to the annotation guidelines of the Graphbrain project (https://graphbrain.net/manual/notation.html). However, atom annotations are modified and at the end contains: * label, * type and optional subtype, * type specific atom roles, * type specific additional information, * named entity
dc.language.iso eng
dc.publisher Faculty of Science University of Split
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label ACA
dc.source.uri https://www.acnltutor.net
dc.subject semantic hypergraph
dc.subject natural language processing
dc.title Natural Language 2 Semantic Hypergraph Dataset NL2SH 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/bzitko/nl2sh_repo
contact.person Branko Žitko bzitko@pmfst.hr Faculty of Science University of Split
sponsor Office of Naval Research N00014-20-1-2066 Enhancing Adaptive Courseware based on Natural Language Processing nationalFunds
size.info 664 sentences
size.info 6851 tokens
size.info 5968 words
files.count 1
files.size 1096051


 Files in this item

This item is
Academic Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
Inform Before Use Attribution Required Noncommercial
Icon
Name
nl2sh_dataset.txt
Size
1.05 MB
Format
Text file
Description
Dataset in TXT format
MD5
8ea669eef7103a307496997db3ae4600
 Download file

Show simple item record