Attribute descriptions document_id - ID of the document (PhD thesis) the term candidate is extracted from area - One of the three scientific areas the PhD thesis covers (Kemija: Chemistry, Politologija: Political Science, Računalništvo: Computer Science) annotation_round - The annotation round the term candidate was annotated lemma_sequence - Sequence of lemmas of the term candidate most_frequent_sequence - Sequence of most frequent tokens of the term candidate (does not have to be the canonical form) pattern - Morphosyntactic pattern the term candidate satisfies length - Length of the term candidate annotator_1 - Response of annotator 1 (annotator number is a pseudoidentifier of a human annotator throughout one area, different annotators were used for each area) annotator_2 - Response of annotator 2 (t_termin: term, x_izvenpodročni: out-of-domain term, z_znanstveno: scientific term, n_nerelevantno: no term) annotator_3 - Response of annotator 3 annotator_4 - Response of annotator 4 frequency - Term candidate frequency tfidf - Term candidate tf-idf statistic (as calculated via CollTerm) chisq - Chi-square statistic (as calculated via CollTerm) dice - Dice statistic (as calculated via CollTerm) ll - Log-likelihood statistic (as calculated via CollTerm) mi - Mutual information statistic (as calculated via CollTerm) tscore - T-score statistic (as calculated via CollTerm) cvalue - C-value (calculated separately from CollTerm)