Show simple item record

 
dc.contributor.author Krsnik, Luka
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2024-07-28T12:22:42Z
dc.date.available 2024-07-28T12:22:42Z
dc.date.issued 2024-07-26
dc.identifier.uri http://hdl.handle.net/11356/1958
dc.description STARK is a highly customizable tool designed for extracting different types of syntactic structures (trees) from parsed corpora (treebanks), aimed at corpus-driven linguistic investigations of syntactic and lexical phenomena of various kinds. It takes a treebank in the CONLL-U format as input and returns a list of all relevant dependency trees with frequency information and other useful statistics, such as the strength of association between the nodes of a tree, or its significance in comparison to another treebank. For installation, execution and the description of various user-defined parameter settings, see the official project page at: https://github.com/clarinsi/STARK. An online demo version of the tool is available at: https://orodja.cjvt.si/stark/. In comparison to v2, this version introduces several new features and improvements, such as the ability to extract very long trees, ignore irrelevant relations, process multi-root treebanks, or handle special operators when querying.
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Faculty of Arts, University of Ljubljana
dc.publisher CLARIN.SI
dc.relation.isreferencedby https://unidive.lisn.upsaclay.fr/lib/exe/fetch.php?media=meetings:2023-saclay:abstracts:62_dobrovoljc_et_al_stark_a_tool_for_dependency_tree.pdf
dc.relation.replaces http://hdl.handle.net/11356/1899
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri https://github.com/clarinsi/STARK
dc.subject corpus linguistics
dc.subject text processing
dc.subject dependency trees
dc.subject extraction
dc.subject n-grams
dc.subject universal dependencies
dc.subject syntax
dc.subject multiword expressions
dc.subject syntactic structures
dc.title Dependency tree extraction tool STARK 3.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent false
has.files yes
branding CLARIN.SI data & tools
demo.uri https://orodja.cjvt.si/stark/
contact.person Kaja Dobrovoljc kaja.dobrovoljc@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) Z6-4617 Treebank-Driven Approach to the Study of Spoken Slovenian nationalFunds
files.count 1
files.size 3388265


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
STARK-3.0.zip
Size
3.23 MB
Format
application/zip
Description
GitHub source code
MD5
ff6094902108c1fa5c5b13737b722ec5
 Download file  Preview
 File Preview  
  • STARK-3.0
    • run.bat49 B
    • setup.py661 B
    • stark.py429 B
    • README.md8 kB
    • scripts
      • grew_corpus_names.txt5 kB
      • create_codes_mapper.py907 B
      • codes_and_flags.yaml14 kB
    • .gitignore195 B
    • install.bat42 B
    • tests
      • test_data
        • input
          • sl_ssj-ud-dev.conllu1 MB
          • en_ewt-ud-dev.conllu1 MB
          • dir_input
            • sl_ssj-ud-dev.conllu1 MB
            • en_ewt-ud-dev.conllu1 MB
        • configs
          • config_internal_storage.ini1 kB
          • config_query.ini1 kB
          • config_greedy_complete.ini451 B
          • config_compare.ini1 kB
          • config_output_settings.ini1 kB
          • config_base.ini1 kB
        • correct_output
          • out_base.tsv26 kB
          • sentence_count_file.tsv961 kB
          • out_output_settings.tsv95 kB
          • out_fixed.tsv22 kB
          • out_compare.tsv20 kB
          • detailed_results_file_query.tsv845 kB
          • out_greedy_complete.tsv35 kB
          • sentence_count_file_greedy.tsv961 kB
          • detailed_results_file_greedy.tsv845 kB
          • out_query.tsv21 kB
          • out_dir.tsv55 kB
          • out_internal_storage2.tsv26 kB
      • tests.py9 kB
      • __init__.py144 B
    • settings.md14 kB
    • stark
      • utils.py2 kB
      • __init__.py55 B
      • processing
        • writers.py14 kB
        • processor.py2 kB
        • document_processor.py3 kB
        • counters.py9 kB
        • cache.py4 kB
        • filters.py9 kB
        • __init__.py0 B
        • query_trees.py10 kB
      • data
        • representation
          • greedy_tree.py3 kB
          • __init__.py0 B
          • tree.py9 kB
          • node.py1 kB
          • query_tree.py1 kB
        • document.py1 kB
        • summary.py2 kB
        • __init__.py0 B
        • processing
          • greedy_tree.py5 kB
          • __init__.py0 B
          • tree.py3 kB
          • query_tree.py14 kB
      • _version.py78 B
      • stark.py12 kB
      • resources
        • codes_mapper.json11 kB
        • constants.py247 B
        • __init__.py0 B
    • logos
      • ARRS.png13 kB
      • FF.png43 kB
      • CJVT.png76 kB
      • CLARIN.png28 kB
      • ARRS.svg139 kB
      • FF.svg313 kB
      • FRI.png128 kB
    • advanced.md6 kB
    • sample
      • output.tsv1 MB
      • sl_ssj-ud-dev.conllu1 MB
      • en_ewt-ud-dev.conllu1 MB
      • fr_gsd-ud-dev.conllu2 B
    • config.ini1 kB
    • requirements.txt41 B
    • MANIFEST.in57 B
    • LICENSE.txt11 kB
    • run.sh80 B

Show simple item record