dc.contributor.author | Krsnik, Luka |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Ključevšek, Aleksander |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2024-09-10T11:21:48Z |
dc.date.available | 2024-09-10T11:21:48Z |
dc.date.issued | 2024-08-28 |
dc.identifier.uri | http://hdl.handle.net/11356/1964 |
dc.description | The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.3 adds support for the KOST 2.0 Slovene Learner Corpus (http://hdl.handle.net/11356/1887) in XML format. It also allows program execution using the command line (see 00README.txt for details), and uses a later version of Java (tested using JDK 21). In addition, Windows users no longer need to have Java installed on their computers to run the program. |
dc.language.iso | slv |
dc.language.iso | eng |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kljucevsek-et-al_Ucinkovit-izracun-frekvencnih-statistik-za-slovenske-jezikovne-korpuse.pdf |
dc.relation.isreferencedby | https://gitea.cjvt.si/lkrsnik/list |
dc.relation.isreferencedby | http://slovnica.ijs.si/wp-content/uploads/2019/11/LIST_prirocnik_1.0.pdf |
dc.relation.replaces | http://hdl.handle.net/11356/1276 |
dc.rights | Apache License 2.0 |
dc.rights.uri | https://opensource.org/licenses/Apache-2.0 |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | corpus linguistics |
dc.subject | text processing |
dc.subject | extraction |
dc.subject | characters |
dc.subject | word parts |
dc.subject | words |
dc.subject | word sets |
dc.subject | n-grams |
dc.subject | morphology |
dc.title | Corpus extraction tool LIST 1.3 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | false |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds |
files.count | 1 |
files.size | 242296320 |
Datoteke v tem vnosu

- Ime
- list1.3.zip
- Velikost
- 231.07 MB
- Format
- application/zip
- MD5
- 0a3acb5a7300dc71199b77e992ba25cb
- config_examples
- config_wordSets_instructions.txt4 kB
- config_wordParts_instructions.txt4 kB
- config_wordSets.json644 B
- config_words.json618 B
- config_characters.json558 B
- config_words_instructions.txt4 kB
- config_wordParts.json605 B
- config_characters_instructions.txt3 kB
- run.sh44 B
- list.jar26 MB
- list.exe26 MB
- run.bat33 B
- 00README.txt2 kB
- jre
- include
- legal
- jdk.incubator.vector
- LICENSE33 B
- COPYRIGHT35 B
- java.sql.rowset
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.jvmstat
- LICENSE33 B
- COPYRIGHT35 B
- jdk.dynalink
- LICENSE33 B
- dynalink.md1 kB
- COPYRIGHT35 B
- jdk.internal.ed
- LICENSE33 B
- COPYRIGHT35 B
- jdk.unsupported.desktop
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.vm.compiler.management
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jlink
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jpackage
- LICENSE33 B
- COPYRIGHT35 B
- java.datatransfer
- LICENSE33 B
- COPYRIGHT35 B
- jdk.management.agent
- LICENSE33 B
- COPYRIGHT35 B
- jdk.crypto.ec
- LICENSE33 B
- COPYRIGHT35 B
- java.xml.crypto
- santuario.md11 kB
- LICENSE33 B
- COPYRIGHT35 B
- jdk.nio.mapmode
- LICENSE33 B
- COPYRIGHT35 B
- java.xml
- LICENSE33 B
- xerces.md11 kB
- COPYRIGHT35 B
- jcup.md1 kB
- dom.md3 kB
- bcel.md10 kB
- xalan.md13 kB
- jdk.xml.dom
- LICENSE33 B
- COPYRIGHT35 B
- java.net.http
- LICENSE33 B
- COPYRIGHT35 B
- jdk.httpserver
- LICENSE33 B
- COPYRIGHT35 B
- jdk.zipfs
- LICENSE33 B
- COPYRIGHT35 B
- jdk.hotspot.agent
- LICENSE33 B
- COPYRIGHT35 B
- jdk.javadoc
- LICENSE33 B
- COPYRIGHT35 B
- jqueryUI.md1 kB
- jquery.md2 kB
- java.naming
- LICENSE33 B
- COPYRIGHT35 B
- jdk.crypto.cryptoki
- LICENSE33 B
- COPYRIGHT35 B
- pkcs11wrapper.md2 kB
- pkcs11cryptotoken.md3 kB
- jdk.editpad
- LICENSE33 B
- COPYRIGHT35 B
- jdk.crypto.mscapi
- LICENSE33 B
- COPYRIGHT35 B
- jdk.accessibility
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.vm.ci
- LICENSE33 B
- COPYRIGHT35 B
- java.instrument
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jconsole
- LICENSE33 B
- COPYRIGHT35 B
- java.management.rmi
- LICENSE33 B
- COPYRIGHT35 B
- jdk.management
- LICENSE33 B
- COPYRIGHT35 B
- jdk.attach
- LICENSE33 B
- COPYRIGHT35 B
- java.logging
- LICENSE33 B
- COPYRIGHT35 B
- java.sql
- LICENSE33 B
- COPYRIGHT35 B
- jdk.security.auth
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jdi
- LICENSE33 B
- COPYRIGHT35 B
- java.security.jgss
- LICENSE33 B
- COPYRIGHT35 B
- java.prefs
- LICENSE33 B
- COPYRIGHT35 B
- java.transaction.xa
- LICENSE33 B
- COPYRIGHT35 B
- jdk.net
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jdeps
- LICENSE33 B
- COPYRIGHT35 B
- jdk.random
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jcmd
- LICENSE33 B
- COPYRIGHT35 B
- jdk.compiler
- LICENSE33 B
- COPYRIGHT35 B
- jdk.naming.rmi
- LICENSE33 B
- COPYRIGHT35 B
- jdk.management.jfr
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.vm.compiler
- LICENSE33 B
- COPYRIGHT35 B
- jdk.unsupported
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jfr
- LICENSE33 B
- COPYRIGHT35 B
- java.smartcardio
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jartool
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.le
- LICENSE33 B
- jline.md1 kB
- COPYRIGHT35 B
- jdk.jstatd
- LICENSE33 B
- COPYRIGHT35 B
- java.scripting
- LICENSE33 B
- COPYRIGHT35 B
- jdk.naming.dns
- LICENSE33 B
- COPYRIGHT35 B
- jdk.sctp
- LICENSE33 B
- COPYRIGHT35 B
- jdk.localedata
- LICENSE33 B
- COPYRIGHT35 B
- cldr.md33 B
- thaidict.md1 kB
- java.desktop
- harfbuzz.md3 kB
- COPYRIGHT35 B
- giflib.md1 kB
- freetype.md11 kB
- lcms.md2 kB
- mesa3d.md5 kB
- libpng.md6 kB
- LICENSE33 B
- colorimaging.md167 B
- jpeg.md3 kB
- java.management
- LICENSE33 B
- COPYRIGHT35 B
- java.rmi
- LICENSE33 B
- COPYRIGHT35 B
- java.compiler
- LICENSE33 B
- COPYRIGHT35 B
- jdk.charsets
- LICENSE33 B
- COPYRIGHT35 B
- java.base
- asm.md1 kB
- cldr.md9 kB
- COPYRIGHT3 kB
- zlib.md1011 B
- wepoll.md1 kB
- unicode.md8 kB
- public_suffix.md17 kB
- aes.md1 kB
- icu.md29 kB
- LICENSE6 kB
- c-libutl.md1 kB
- jdk.jsobject
- LICENSE33 B
- COPYRIGHT35 B
- java.se
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jshell
- LICENSE33 B
- COPYRIGHT35 B
- java.security.sasl
- LICENSE33 B
- COPYRIGHT35 B
- jdk.security.jgss
- LICENSE33 B
- COPYRIGHT35 B
- jdk.jdwp.agent
- LICENSE33 B
- COPYRIGHT35 B
- jdk.internal.opt
- LICENSE33 B
- COPYRIGHT35 B
- jopt-simple.md1 kB
- jdk.incubator.vector
- README290 B
- conf
- management
- jmxremote.password.template5 kB
- management.properties14 kB
- jmxremote.access3 kB
- logging.properties2 kB
- sound.properties1 kB
- security
- jaxp.properties7 kB
- net.properties6 kB
- management
- lib
- jrt-fs.jar107 kB
- psfontj2d.properties10 kB
- modules131 MB
- tzdb.dat101 kB
- psfont.properties.ja2 kB
- ct.sym10 MB
- jfr
- default.jfc36 kB
- profile.jfc36 kB
- jawt.lib1 kB
- fontconfig.bfc4 kB
- tzmappings21 kB
- security
- public_suffix_list.dat225 kB
- cacerts126 kB
- blocked.certs2 kB
- default.policy11 kB
- fontconfig.properties.src11 kB
- jvm.cfg29 B
- classlist78 kB
- jvm.lib1 MB
- src.zip42 MB
- jmods
- jdk.management.jfr.jmod55 kB
- jdk.compiler.jmod10 MB
- java.transaction.xa.jmod6 kB
- java.datatransfer.jmod52 kB
- java.prefs.jmod59 kB
- jdk.naming.dns.jmod64 kB
- java.security.jgss.jmod687 kB
- jdk.nio.mapmode.jmod4 kB
- jdk.crypto.cryptoki.jmod407 kB
- java.smartcardio.jmod62 kB
- jdk.internal.vm.compiler.jmod4 kB
- jdk.incubator.vector.jmod1 MB
- java.management.rmi.jmod91 kB
- jdk.jdwp.agent.jmod138 kB
- java.xml.jmod4 MB
- jdk.internal.le.jmod465 kB
- java.rmi.jmod277 kB
- jdk.jcmd.jmod167 kB
- jdk.editpad.jmod9 kB
- jdk.sctp.jmod25 kB
- java.se.jmod4 kB
- jdk.jdeps.jmod758 kB
- jdk.security.auth.jmod81 kB
- jdk.net.jmod35 kB
- jdk.jdi.jmod888 kB
- java.naming.jmod466 kB
- jdk.internal.jvmstat.jmod84 kB
- java.net.http.jmod751 kB
- java.sql.rowset.jmod191 kB
- jdk.unsupported.jmod19 kB
- jdk.httpserver.jmod162 kB
- jdk.crypto.mscapi.jmod86 kB
- java.compiler.jmod123 kB
- jdk.unsupported.desktop.jmod16 kB
- java.logging.jmod116 kB
- java.desktop.jmod11 MB
- java.base.jmod20 MB
- jdk.hotspot.agent.jmod2 MB
- jdk.internal.vm.ci.jmod485 kB
- jdk.zipfs.jmod104 kB
- jdk.jartool.jmod221 kB
- jdk.internal.opt.jmod87 kB
- jdk.accessibility.jmod528 kB
- jdk.jstatd.jmod38 kB
- java.instrument.jmod49 kB
- jdk.charsets.jmod1 MB
- jdk.jconsole.jmod476 kB
- jdk.internal.ed.jmod9 kB
- jdk.random.jmod23 kB
- java.xml.crypto.jmod665 kB
- jdk.security.jgss.jmod27 kB
- jdk.management.jmod77 kB
- java.security.sasl.jmod82 kB
- jdk.crypto.ec.jmod137 kB
- jdk.jpackage.jmod1 MB
- java.management.jmod882 kB
- jdk.management.agent.jmod86 kB
- jdk.javadoc.jmod1 MB
- jdk.internal.vm.compiler.management.jmod4 kB
- jdk.dynalink.jmod155 kB
- java.sql.jmod76 kB
- jdk.localedata.jmod11 MB
- jdk.jlink.jmod450 kB
- java.scripting.jmod49 kB
- jdk.naming.rmi.jmod25 kB
- jdk.jsobject.jmod5 kB
- jdk.attach.jmod44 kB
- jdk.xml.dom.jmod43 kB
- jdk.jshell.jmod741 kB
- jdk.jfr.jmod860 kB
- release1 kB
- LICENSE6 kB
- bin
- splashscreen.dll216 kB
- kinit.exe23 kB
- jaccesswalker.exe69 kB
- w2k_lsa_auth.dll30 kB
- api-ms-win-core-file-l1-1-0.dll14 kB
- jstatd.exe23 kB
- api-ms-win-core-synch-l1-2-0.dll11 kB
- jstack.exe23 kB
- extnet.dll23 kB
- sunmscapi.dll47 kB
- jrunscript.exe23 kB
- jmap.exe23 kB
- freetype.dll535 kB
- api-ms-win-crt-filesystem-l1-1-0.dll13 kB
- jli.dll89 kB
- api-ms-win-core-errorhandling-l1-1-0.dll11 kB
- api-ms-win-core-console-l1-1-0.dll11 kB
- jdwp.dll230 kB
- dt_shmem.dll36 kB
- api-ms-win-core-memory-l1-1-0.dll11 kB
- jinfo.exe23 kB
- jdeps.exe23 kB
- api-ms-win-core-interlocked-l1-1-0.dll11 kB
- api-ms-win-core-processthreads-l1-1-1.dll11 kB
- api-ms-win-core-namedpipe-l1-1-0.dll11 kB
- api-ms-win-core-libraryloader-l1-1-0.dll12 kB
- rmiregistry.exe23 kB
- api-ms-win-core-util-l1-1-0.dll11 kB
- api-ms-win-crt-multibyte-l1-1-0.dll19 kB
- jsound.dll60 kB
- api-ms-win-core-sysinfo-l1-1-0.dll12 kB
- api-ms-win-crt-process-l1-1-0.dll12 kB
- api-ms-win-crt-conio-l1-1-0.dll12 kB
- server
- jvm.dll12 MB
- classes.jsa12 MB
- classes_nocoops.jsa12 MB
- jdb.exe23 kB
- jaccessinspector.exe104 kB
- api-ms-win-crt-runtime-l1-1-0.dll15 kB
- mlib_image.dll498 kB
- attach.dll28 kB
- net.dll58 kB
- java.dll118 kB
- api-ms-win-core-string-l1-1-0.dll11 kB
- api-ms-win-core-synch-l1-1-0.dll13 kB
- verify.dll53 kB
- api-ms-win-crt-environment-l1-1-0.dll11 kB
- javajpeg.dll177 kB
- api-ms-win-core-processthreads-l1-1-0.dll13 kB
- java.exe53 kB
- api-ms-win-crt-convert-l1-1-0.dll15 kB
- jlink.exe23 kB
- api-ms-win-crt-utility-l1-1-0.dll11 kB
- vcruntime140.dll95 kB
- j2pcsc.dll25 kB
- serialver.exe23 kB
- awt.dll1 MB
- management.dll28 kB
- jconsole.exe23 kB
- javac.exe23 kB
- api-ms-win-core-timezone-l1-1-0.dll11 kB
- dt_socket.dll35 kB
- jabswitch.exe44 kB
- instrument.dll51 kB
- jfr.exe23 kB
- zip.dll87 kB
- api-ms-win-core-file-l2-1-0.dll11 kB
- api-ms-win-core-rtlsupport-l1-1-0.dll11 kB
- jmod.exe23 kB
- api-ms-win-crt-time-l1-1-0.dll13 kB
- management_agent.dll24 kB
- javadoc.exe23 kB
- jaas.dll27 kB
- j2pkcs11.dll78 kB
- api-ms-win-crt-private-l1-1-0.dll62 kB
- javaw.exe53 kB
- fontmanager.dll891 kB
- vcruntime140_1.dll36 kB
- sspi_bridge.dll44 kB
- api-ms-win-crt-string-l1-1-0.dll17 kB
- api-ms-win-core-processenvironment-l1-1-0.dll12 kB
- api-ms-win-core-handle-l1-1-0.dll11 kB
- jar.exe23 kB
- jdeprscan.exe23 kB
- keytool.exe23 kB
- api-ms-win-core-localization-l1-2-0.dll14 kB
- api-ms-win-crt-heap-l1-1-0.dll12 kB
- windowsaccessbridge-64.dll71 kB
- javaaccessbridge.dll284 kB
- le.dll34 kB
- api-ms-win-core-datetime-l1-1-0.dll11 kB
- jhsdb.exe23 kB
- saproc.dll38 kB
- msvcp140.dll558 kB
- jarsigner.exe23 kB
- nio.dll78 kB
- jwebserver.exe23 kB
- rmi.dll20 kB
- jawt.dll20 kB
- api-ms-win-crt-stdio-l1-1-0.dll17 kB
- api-ms-win-core-profile-l1-1-0.dll11 kB
- j2gss.dll49 kB
- prefs.dll25 kB
- api-ms-win-core-debug-l1-1-0.dll11 kB
- management_ext.dll35 kB
- jimage.dll32 kB
- ktab.exe23 kB
- jps.exe23 kB
- api-ms-win-crt-math-l1-1-0.dll20 kB
- jsvml.dll849 kB
- api-ms-win-core-heap-l1-1-0.dll11 kB
- api-ms-win-core-file-l1-2-0.dll11 kB
- syslookup.dll27 kB
- klist.exe23 kB
- jpackage.dll129 kB
- jstat.exe23 kB
- jimage.exe23 kB
- jshell.exe23 kB
- javap.exe23 kB
- api-ms-win-core-console-l1-2-0.dll11 kB
- jcmd.exe23 kB
- lcms.dll246 kB
- jpackage.exe23 kB
- ucrtbase.dll1011 kB
- api-ms-win-crt-locale-l1-1-0.dll11 kB