dc.contributor.author |
Krsnik, Luka |
dc.contributor.author |
Arhar Holdt, Špela |
dc.contributor.author |
Čibej, Jaka |
dc.contributor.author |
Dobrovoljc, Kaja |
dc.contributor.author |
Ključevšek, Aleksander |
dc.contributor.author |
Krek, Simon |
dc.contributor.author |
Robnik-Šikonja, Marko |
dc.date.accessioned |
2019-11-19T08:00:17Z |
dc.date.available |
2019-11-19T08:00:17Z |
dc.date.issued |
2019-11-18 |
dc.identifier.uri |
http://hdl.handle.net/11356/1276 |
dc.description |
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software.
Version 1.2 adds support for Gigafida 2.0 in XML format and fixes a bug which disabled the extraction of character-level n-grams from normalized forms in the GOS 1.0 corpus. |
dc.language.iso |
slv |
dc.language.iso |
eng |
dc.publisher |
Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher |
Faculty of Computer and Information Science, University of Ljubljana |
dc.publisher |
Jožef Stefan Institute |
dc.relation.isreferencedby |
http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kljucevsek-et-al_Ucinkovit-izracun-frekvencnih-statistik-za-slovenske-jezikovne-korpuse.pdf |
dc.relation.isreferencedby |
https://gitea.cjvt.si/lkrsnik/list |
dc.relation.isreferencedby |
http://slovnica.ijs.si/wp-content/uploads/2019/11/LIST_prirocnik_1.0.pdf |
dc.relation.replaces |
http://hdl.handle.net/11356/1227 |
dc.rights |
Apache License 2.0 |
dc.rights.uri |
https://opensource.org/licenses/Apache-2.0 |
dc.rights.label |
PUB |
dc.source.uri |
http://slovnica.ijs.si/ |
dc.subject |
corpus linguistics |
dc.subject |
text processing |
dc.subject |
extraction |
dc.subject |
characters |
dc.subject |
word parts |
dc.subject |
words |
dc.subject |
word sets |
dc.subject |
n-grams |
dc.subject |
morphology |
dc.title |
Corpus extraction tool LIST 1.2 |
dc.type |
toolService |
metashare.ResourceInfo#ContentInfo.detailedType |
tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent |
false |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor |
ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
sponsor |
ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor |
Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
files.count |
1 |
files.size |
17055037 |