Prikaži enostavni zapis vnosa

 
dc.contributor.author Krsnik, Luka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Čibej, Jaka
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Ključevšek, Aleksander
dc.contributor.author Krek, Simon
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2019-11-19T08:00:17Z
dc.date.available 2019-11-19T08:00:17Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/11356/1276
dc.description The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.2 adds support for Gigafida 2.0 in XML format and fixes a bug which disabled the extraction of character-level n-grams from normalized forms in the GOS 1.0 corpus.
dc.language.iso slv
dc.language.iso eng
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kljucevsek-et-al_Ucinkovit-izracun-frekvencnih-statistik-za-slovenske-jezikovne-korpuse.pdf
dc.relation.isreferencedby https://gitea.cjvt.si/lkrsnik/list
dc.relation.isreferencedby http://slovnica.ijs.si/wp-content/uploads/2019/11/LIST_prirocnik_1.0.pdf
dc.relation.replaces http://hdl.handle.net/11356/1227
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject corpus linguistics
dc.subject text processing
dc.subject extraction
dc.subject characters
dc.subject word parts
dc.subject words
dc.subject word sets
dc.subject n-grams
dc.subject morphology
dc.title Corpus extraction tool LIST 1.2
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent false
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
files.count 1
files.size 17055037


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Apache License 2.0
Icon
Ime
list1.2.zip
Velikost
16.26 MB
Format
application/zip
Opis
LIST 1.2
MD5
bf69b6489888967f73d2084d978fac54
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • list1.2.jar17 MB
    • run.sh47 B
    • run.bat36 B

Prikaži enostavni zapis vnosa