dc.contributor.author | Žagar, Aleš |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2022-12-21T10:48:47Z |
dc.date.available | 2022-12-21T10:48:47Z |
dc.date.issued | 2022-12-21 |
dc.identifier.uri | http://hdl.handle.net/11356/1751 |
dc.description | A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje. |
dc.language.iso | slv |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://rsdo.slovenscina.eu/ |
dc.subject | text summarization |
dc.subject | T5 |
dc.subject | graph methods |
dc.title | Slovenian text summarization models |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://slovenscina.eu/povzemanje |
contact.person | Aleš Žagar Ales.Zagar@fri.uni-lj.si Aleš Žagar |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
files.count | 5 |
files.size | 5178939468 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- t5-article.zip
- Size
- 271.75 MB
- Format
- application/zip
- Description
- Neznano
- MD5
- 9bdefddcc51118a8fe1758e364bbd537
- model
- SloT5-cnndm_slo_pretraining
- config.json775 B
- training_args.bin3 kB
- spiece.model778 kB
- tokenizer_config.json1 kB
- special_tokens_map.json1 kB
- pytorch_model.bin293 MB
- SloT5-cnndm_slo_pretraining
- Name
- t5-headline.zip
- Size
- 271.41 MB
- Format
- application/zip
- Description
- Neznano
- MD5
- 55f27433a996e2bd231962e4c8c5f548
- model
- SloT5-sta_headline
- config.json772 B
- training_args.bin3 kB
- spiece.model778 kB
- tokenizer_config.json1 kB
- special_tokens_map.json1 kB
- pytorch_model.bin293 MB
- SloT5-sta_headline
- Name
- metamodel.zip
- Size
- 767.5 MB
- Format
- application/zip
- Description
- Neznano
- MD5
- 2cf60b69ab6b2d8be4565ccbc96a99cb
- Name
- hybrid-long.zip
- Size
- 1.91 GB
- Format
- application/zip
- Description
- Neznano
- MD5
- 997f03afa041552290e5dd7f22416942
- model
- LaBSE
- 3_Normalize
- README.md1 kB
- tokenizer_config.json491 B
- pytorch_model.bin1 GB
- sentence_bert_config.json53 B
- config.json907 B
- 2_Dense
- config.json114 B
- pytorch_model.bin2 MB
- tokenizer.json13 MB
- vocab.txt4 MB
- config_sentence_transformers.json122 B
- modules.json461 B
- special_tokens_map.json112 B
- 1_Pooling
- config.json190 B
- 3_Normalize
- SloT5-cnndm_slo_pretraining
- config.json775 B
- training_args.bin3 kB
- spiece.model778 kB
- tokenizer_config.json1 kB
- special_tokens_map.json1 kB
- pytorch_model.bin293 MB
- LaBSE
- Name
- graph-based.zip
- Size
- 1.64 GB
- Format
- application/zip
- Description
- Neznano
- MD5
- 30888da8d5f758ae0cd0eebc6f0eb1ac
- model
- LaBSE
- 3_Normalize
- README.md1 kB
- tokenizer_config.json491 B
- pytorch_model.bin1 GB
- sentence_bert_config.json53 B
- config.json907 B
- 2_Dense
- config.json114 B
- pytorch_model.bin2 MB
- tokenizer.json13 MB
- vocab.txt4 MB
- config_sentence_transformers.json122 B
- modules.json461 B
- special_tokens_map.json112 B
- 1_Pooling
- config.json190 B
- 3_Normalize
- LaBSE