STARK is a highly customizable tool designed for extracting different types of syntactic structures (trees) from parsed corpora (treebanks), aimed at corpus-driven linguistic investigations of syntactic and lexical phenomena of various kinds.
It takes a treebank in the CONLL-U format as input and returns a list of all relevant dependency trees with frequency information and other useful statistics, such as the strength of association between the nodes of a tree, or its significance in comparison to another treebank.
For installation, execution and the description of various user-defined parameter settings, see the official project page at: https://github.com/clarinsi/STARK. An online demo version of the tool is available at: https://orodja.cjvt.si/stark/.
In comparison to v2, this version introduces several new features and improvements, such as the ability to extract very long trees, ignore irrelevant relations, process multi-root treebanks, or handle special operators when querying.
Jožef Stefan InstituteCLARIN"CLARIN.SI"ARRS (Slovenian Research Agency)P6-0411"Language Resources and Technologies for Slovene"ARRS (Slovenian Research Agency)Z6-4617"Treebank-Driven Approach to the Study of Spoken Slovenian"