| dc.contributor.author | Malenšek, Miha |
| dc.contributor.author | Žitnik, Slavko |
| dc.contributor.author | Završnik, Aleš |
| dc.contributor.author | Krajnc, Saša |
| dc.contributor.author | Križnar, Primož |
| dc.contributor.author | Bajec, Marko |
| dc.date.accessioned | 2026-03-03T09:28:39Z |
| dc.date.available | 2026-03-03T09:28:39Z |
| dc.date.issued | 2024-12-31 |
| dc.identifier.uri | http://hdl.handle.net/11356/2095 |
| dc.description | COLESLAW 1.0 is a large-scale collection of Slovenian legal texts compiled from authoritative public sources. The corpus covers legislative, judicial, and governmental legal documents and is designed to support research in legal NLP, information retrieval, contradiction detection, legal reasoning, and domain adaptation of language models. COLESLAW 1.0 consists of 547,799 unique documents, totalling 771.93 million words, encoded in 12 files. The corpus aggregates documents from four primary domains: - PISRS, Legal Information System of the Republic of Slovenia (pisrs.si) - SodnaPraksa, Lower Court Decisions (sodnapraksa.si) - USRS, Constitutional Court of the Republic of Slovenia (www.us-rs.si) - Uradni List, Constitutional Court of the Republic of Slovenia (www.uradni-list.si) Sources such as PISRS, Uradni List and SodnaPraksa are additionally divided into: - PISRS: enacted laws, repealed laws, legislative proposals, general and individual acts, register of regulations, records of normative authorities - Uradni List: regulatory and annoucement - SodnaPraksa: lower court decisons and non-pecuniary damage claims Each domain has a corresponding README.txt file detailing contents of the files and providing descriptions for keys and metadata present in the domain. Documents are stored in structured JSONL format and include unique identifiers, full cleaned text, and source-specific metadata. Legislative files typically contain regulation identifiers and procedural references, judicial files include structured components such as headnotes, operative parts and reasoning, and gazette publications contain issue identifiers and publication metadata. A complete specification of keys, field definitions, and subcollection-level statistics is provided in the respective README.txt files for each domain. |
| dc.language.iso | slv |
| dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
| dc.relation.isreferencedby | https://journals.uni-lj.si/slovenscina2/issue/view/1698 |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://www.cjvt.si/llm4dh/ |
| dc.subject | law |
| dc.subject | slovenian law |
| dc.title | Collection of Slovenian legal texts COLESLAW 1.0 |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Miha Malenšek miha.malensek@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
| sponsor | ARIS (Slovenian Research and Innovation Agency) GC-0002 LLM4DH: Large Language Models for Digital Humanities nationalFunds |
| size.info | 12 files |
| size.info | 547799 texts |
| size.info | 771931725 words |
| files.count | 1 |
| files.size | 1333730153 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- COLESLAW.zip
- Size
- 1.24 GB
- Format
- application/zip
- Description
- COLESLAW Corpus
- MD5
- d82f7a6d1cef54d3c57a2c08bbefb746
- COLESLAW 1.0
- UradniList
- ul-razglasni.jsonl-1 B
- ul-uredbeni.jsonl-1 B
- README_ul.txt-1 B
- PISRS
- register-predpisov.jsonl-1 B
- README_pisrs.txt-1 B
- drugi-splosni-in-posamicni-akti.jsonl-1 B
- evidenca-normodajalcev.jsonl-1 B
- obsoletni-in-konzumirani-predpisi.jsonl-1 B
- neveljavni-predpisi.jsonl-1 B
- splosni-akti-za-izvrsevanje-javnih-pooblastil.jsonl-1 B
- predpisi-v-pripravi.jsonl-1 B
- USRS
- usrs.jsonl-1 B
- README_usrs.txt-1 B
- SodnaPraksa
- sp_claims.jsonl-1 B
- README_sp.txt-1 B
- sp_courts.jsonl-1 B
- UradniList