Slovenian text summarization models

Name: Slovenian text summarization models
License: https://creativecommons.org/licenses/by-sa/4.0/

Žagar, Aleš; Robnik-Šikonja, Marko

dc.contributor.author	Žagar, Aleš
dc.contributor.author	Robnik-Šikonja, Marko
dc.date.accessioned	2022-12-21T10:48:47Z
dc.date.available	2022-12-21T10:48:47Z
dc.date.issued	2022-12-21
dc.identifier.uri	http://hdl.handle.net/11356/1751
dc.description	A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje.
dc.language.iso	slv
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://rsdo.slovenscina.eu/
dc.subject	text summarization
dc.subject	T5
dc.subject	graph methods
dc.title	Slovenian text summarization models
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
has.files	yes
branding	CLARIN.SI data & tools
demo.uri	https://slovenscina.eu/povzemanje
contact.person	Aleš Žagar Ales.Zagar@fri.uni-lj.si Aleš Žagar
sponsor	Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count	5
files.size	5178939468

Datoteke v tem vnosu

To je vnos

Publicly Available

z licenco:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Ime: t5-article.zip
Velikost: 271.75 MB
Format: application/zip
Opis: Neznano
MD5: 9bdefddcc51118a8fe1758e364bbd537

Prenesi datoteko Predogled

Predogled datoteke

model
- SloT5-cnndm_slo_pretraining
  - config.json775 B
  - training_args.bin3 kB
  - spiece.model778 kB
  - tokenizer_config.json1 kB
  - special_tokens_map.json1 kB
  - pytorch_model.bin293 MB

Ime: t5-headline.zip
Velikost: 271.41 MB
Format: application/zip
Opis: Neznano
MD5: 55f27433a996e2bd231962e4c8c5f548

Prenesi datoteko Predogled

Predogled datoteke

model
- SloT5-sta_headline
  - config.json772 B
  - training_args.bin3 kB
  - spiece.model778 kB
  - tokenizer_config.json1 kB
  - special_tokens_map.json1 kB
  - pytorch_model.bin293 MB

Ime: metamodel.zip
Velikost: 767.5 MB
Format: application/zip
Opis: Neznano
MD5: 2cf60b69ab6b2d8be4565ccbc96a99cb

Prenesi datoteko Predogled

Predogled datoteke

model
- metamodel
  - model.h515 MB
- doc2vec
  - model.wv.vectors.npy65 MB
  - model5 MB
  - model.syn1neg.npy65 MB
  - model.dv.vectors.npy676 MB

Ime: hybrid-long.zip
Velikost: 1.91 GB
Format: application/zip
Opis: Neznano
MD5: 997f03afa041552290e5dd7f22416942

Prenesi datoteko Predogled

Predogled datoteke

model
- LaBSE
  - 3_Normalize
  - README.md1 kB
  - tokenizer_config.json491 B
  - pytorch_model.bin1 GB
  - sentence_bert_config.json53 B
  - config.json907 B
  - 2_Dense
    - config.json114 B
    - pytorch_model.bin2 MB
  - tokenizer.json13 MB
  - vocab.txt4 MB
  - config_sentence_transformers.json122 B
  - modules.json461 B
  - special_tokens_map.json112 B
  - 1_Pooling
    - config.json190 B
- SloT5-cnndm_slo_pretraining
  - config.json775 B
  - training_args.bin3 kB
  - spiece.model778 kB
  - tokenizer_config.json1 kB
  - special_tokens_map.json1 kB
  - pytorch_model.bin293 MB

Ime: graph-based.zip
Velikost: 1.64 GB
Format: application/zip
Opis: Neznano
MD5: 30888da8d5f758ae0cd0eebc6f0eb1ac

Prenesi datoteko Predogled

Predogled datoteke

model
- LaBSE
  - 3_Normalize
  - README.md1 kB
  - tokenizer_config.json491 B
  - pytorch_model.bin1 GB
  - sentence_bert_config.json53 B
  - config.json907 B
  - 2_Dense
    - config.json114 B
    - pytorch_model.bin2 MB
  - tokenizer.json13 MB
  - vocab.txt4 MB
  - config_sentence_transformers.json122 B
  - modules.json461 B
  - special_tokens_map.json112 B
  - 1_Pooling
    - config.json190 B

Prikaži enostavni zapis vnosa

Datoteke v tem vnosu

Partnerji

Partnerji

Repozitorij