Attribute descriptions

document_id - ID of the document (PhD thesis) the term candidate is extracted from
area - One of the three scientific areas the PhD thesis covers (Kemija: Chemistry, Politologija: Political Science, Računalništvo: Computer Science)
annotation_round - The annotation round the term candidate was annotated
lemma_sequence - Sequence of lemmas of the term candidate
most_frequent_sequence - Sequence of most frequent tokens of the term candidate (does not have to be the canonical form)
pattern - Morphosyntactic pattern the term candidate satisfies
length - Length of the term candidate
annotator_1 - Response of annotator 1 (annotator number is a pseudoidentifier of a human annotator throughout one area, different annotators were used for each area)
annotator_2 - Response of annotator 2 (t_termin: term, x_izvenpodročni: out-of-domain term, z_znanstveno: scientific term, n_nerelevantno: no term)
annotator_3 - Response of annotator 3
annotator_4 - Response of annotator 4
frequency - Term candidate frequency
tfidf - Term candidate tf-idf statistic (as calculated via CollTerm)
chisq - Chi-square statistic (as calculated via CollTerm)
dice - Dice statistic (as calculated via CollTerm)
ll - Log-likelihood statistic (as calculated via CollTerm)
mi - Mutual information statistic (as calculated via CollTerm)
tscore - T-score statistic (as calculated via CollTerm)
cvalue - C-value (calculated separately from CollTerm)