Show simple item record

 
dc.contributor.author Malenšek, Miha
dc.contributor.author Žitnik, Slavko
dc.contributor.author Završnik, Aleš
dc.contributor.author Krajnc, Saša
dc.contributor.author Križnar, Primož
dc.contributor.author Bajec, Marko
dc.date.accessioned 2026-03-03T09:28:39Z
dc.date.available 2026-03-03T09:28:39Z
dc.date.issued 2024-12-31
dc.identifier.uri http://hdl.handle.net/11356/2095
dc.description COLESLAW 1.0 is a large-scale collection of Slovenian legal texts compiled from authoritative public sources. The corpus covers legislative, judicial, and governmental legal documents and is designed to support research in legal NLP, information retrieval, contradiction detection, legal reasoning, and domain adaptation of language models. COLESLAW 1.0 consists of 547,799 unique documents, totalling 771.93 million words, encoded in 12 files. The corpus aggregates documents from four primary domains: - PISRS, Legal Information System of the Republic of Slovenia (pisrs.si) - SodnaPraksa, Lower Court Decisions (sodnapraksa.si) - USRS, Constitutional Court of the Republic of Slovenia (www.us-rs.si) - Uradni List, Constitutional Court of the Republic of Slovenia (www.uradni-list.si) Sources such as PISRS, Uradni List and SodnaPraksa are additionally divided into: - PISRS: enacted laws, repealed laws, legislative proposals, general and individual acts, register of regulations, records of normative authorities - Uradni List: regulatory and annoucement - SodnaPraksa: lower court decisons and non-pecuniary damage claims Each domain has a corresponding README.txt file detailing contents of the files and providing descriptions for keys and metadata present in the domain. Documents are stored in structured JSONL format and include unique identifiers, full cleaned text, and source-specific metadata. Legislative files typically contain regulation identifiers and procedural references, judicial files include structured components such as headnotes, operative parts and reasoning, and gazette publications contain issue identifiers and publication metadata. A complete specification of keys, field definitions, and subcollection-level statistics is provided in the respective README.txt files for each domain.
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby https://journals.uni-lj.si/slovenscina2/issue/view/1698
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/llm4dh/
dc.subject law
dc.subject slovenian law
dc.title Collection of Slovenian legal texts COLESLAW 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Miha Malenšek miha.malensek@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor ARIS (Slovenian Research and Innovation Agency) GC-0002 LLM4DH: Large Language Models for Digital Humanities nationalFunds
size.info 12 files
size.info 547799 texts
size.info 771931725 words
files.count 1
files.size 1333730153


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
COLESLAW.zip
Size
1.24 GB
Format
application/zip
Description
COLESLAW Corpus
MD5
d82f7a6d1cef54d3c57a2c08bbefb746
 Download file  Preview
 File Preview  
  • COLESLAW 1.0
    • UradniList
      • ul-razglasni.jsonl-1 B
      • ul-uredbeni.jsonl-1 B
      • README_ul.txt-1 B
    • PISRS
      • register-predpisov.jsonl-1 B
      • README_pisrs.txt-1 B
      • drugi-splosni-in-posamicni-akti.jsonl-1 B
      • evidenca-normodajalcev.jsonl-1 B
      • obsoletni-in-konzumirani-predpisi.jsonl-1 B
      • neveljavni-predpisi.jsonl-1 B
      • splosni-akti-za-izvrsevanje-javnih-pooblastil.jsonl-1 B
      • predpisi-v-pripravi.jsonl-1 B
    • USRS
      • usrs.jsonl-1 B
      • README_usrs.txt-1 B
    • SodnaPraksa
      • sp_claims.jsonl-1 B
      • README_sp.txt-1 B
      • sp_courts.jsonl-1 B

Show simple item record