Corpus extraction tool LIST 1.2

Name: Corpus extraction tool LIST 1.2
License: https://opensource.org/licenses/Apache-2.0

Krsnik, Luka; Arhar Holdt, Špela; Čibej, Jaka; Dobrovoljc, Kaja; Ključevšek, Aleksander; Krek, Simon; Robnik-Šikonja, Marko

dc.contributor.author	Krsnik, Luka
dc.contributor.author	Arhar Holdt, Špela
dc.contributor.author	Čibej, Jaka
dc.contributor.author	Dobrovoljc, Kaja
dc.contributor.author	Ključevšek, Aleksander
dc.contributor.author	Krek, Simon
dc.contributor.author	Robnik-Šikonja, Marko
dc.date.accessioned	2019-11-19T08:00:17Z
dc.date.available	2019-11-19T08:00:17Z
dc.date.issued	2019-11-18
dc.identifier.uri	http://hdl.handle.net/11356/1276
dc.description	The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.2 adds support for Gigafida 2.0 in XML format and fixes a bug which disabled the extraction of character-level n-grams from normalized forms in the GOS 1.0 corpus.
dc.language.iso	slv
dc.language.iso	eng
dc.publisher	Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.publisher	Jožef Stefan Institute
dc.relation.isreferencedby	http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kljucevsek-et-al_Ucinkovit-izracun-frekvencnih-statistik-za-slovenske-jezikovne-korpuse.pdf
dc.relation.isreferencedby	https://gitea.cjvt.si/lkrsnik/list
dc.relation.isreferencedby	http://slovnica.ijs.si/wp-content/uploads/2019/11/LIST_prirocnik_1.0.pdf
dc.relation.replaces	http://hdl.handle.net/11356/1227
dc.relation.isreplacedby	http://hdl.handle.net/11356/1964
dc.rights	Apache License 2.0
dc.rights.uri	https://opensource.org/licenses/Apache-2.0
dc.rights.label	PUB
dc.source.uri	http://slovnica.ijs.si/
dc.subject	corpus linguistics
dc.subject	text processing
dc.subject	extraction
dc.subject	characters
dc.subject	word parts
dc.subject	words
dc.subject	word sets
dc.subject	n-grams
dc.subject	morphology
dc.title	Corpus extraction tool LIST 1.2
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	false
hidden	hidden
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor	ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor	Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
files.count	1
files.size	17055037

Files in this item

This item is

Publicly Available

and licensed under:
Apache License 2.0

Name: list1.2.zip
Size: 16.26 MB
Format: application/zip
Description: LIST 1.2
MD5: bf69b6489888967f73d2084d978fac54

Download file Preview

File Preview

- list1.2.jar17 MB
- run.sh47 B
- run.bat36 B

Show simple item record

Files in this item

Partners

Partners

Repository