Show simple item record

 
dc.contributor.author Žagar, Aleš
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2022-12-21T10:48:47Z
dc.date.available 2022-12-21T10:48:47Z
dc.date.issued 2022-12-21
dc.identifier.uri http://hdl.handle.net/11356/1751
dc.description A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje.
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/
dc.subject text summarization
dc.subject T5
dc.subject graph methods
dc.title Slovenian text summarization models
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
demo.uri https://slovenscina.eu/povzemanje
contact.person Aleš Žagar Ales.Zagar@fri.uni-lj.si Aleš Žagar
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 5
files.size 5178939468


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
t5-article.zip
Size
271.75 MB
Format
application/zip
Description
Neznano
MD5
9bdefddcc51118a8fe1758e364bbd537
 Download file  Preview
 File Preview  
  • model
    • SloT5-cnndm_slo_pretraining
      • config.json775 B
      • training_args.bin3 kB
      • spiece.model778 kB
      • tokenizer_config.json1 kB
      • special_tokens_map.json1 kB
      • pytorch_model.bin293 MB
Icon
Name
t5-headline.zip
Size
271.41 MB
Format
application/zip
Description
Neznano
MD5
55f27433a996e2bd231962e4c8c5f548
 Download file  Preview
 File Preview  
  • model
    • SloT5-sta_headline
      • config.json772 B
      • training_args.bin3 kB
      • spiece.model778 kB
      • tokenizer_config.json1 kB
      • special_tokens_map.json1 kB
      • pytorch_model.bin293 MB
Icon
Name
metamodel.zip
Size
767.5 MB
Format
application/zip
Description
Neznano
MD5
2cf60b69ab6b2d8be4565ccbc96a99cb
 Download file  Preview
 File Preview  
  • model
    • metamodel
      • model.h515 MB
    • doc2vec
      • model.wv.vectors.npy65 MB
      • model5 MB
      • model.syn1neg.npy65 MB
      • model.dv.vectors.npy676 MB
Icon
Name
hybrid-long.zip
Size
1.91 GB
Format
application/zip
Description
Neznano
MD5
997f03afa041552290e5dd7f22416942
 Download file  Preview
 File Preview  
  • model
    • LaBSE
      • 3_Normalize
        • README.md1 kB
        • tokenizer_config.json491 B
        • pytorch_model.bin1 GB
        • sentence_bert_config.json53 B
        • config.json907 B
        • 2_Dense
          • config.json114 B
          • pytorch_model.bin2 MB
        • tokenizer.json13 MB
        • vocab.txt4 MB
        • config_sentence_transformers.json122 B
        • modules.json461 B
        • special_tokens_map.json112 B
        • 1_Pooling
          • config.json190 B
      • SloT5-cnndm_slo_pretraining
        • config.json775 B
        • training_args.bin3 kB
        • spiece.model778 kB
        • tokenizer_config.json1 kB
        • special_tokens_map.json1 kB
        • pytorch_model.bin293 MB
    Icon
    Name
    graph-based.zip
    Size
    1.64 GB
    Format
    application/zip
    Description
    Neznano
    MD5
    30888da8d5f758ae0cd0eebc6f0eb1ac
     Download file  Preview
     File Preview  
    • model
      • LaBSE
        • 3_Normalize
          • README.md1 kB
          • tokenizer_config.json491 B
          • pytorch_model.bin1 GB
          • sentence_bert_config.json53 B
          • config.json907 B
          • 2_Dense
            • config.json114 B
            • pytorch_model.bin2 MB
          • tokenizer.json13 MB
          • vocab.txt4 MB
          • config_sentence_transformers.json122 B
          • modules.json461 B
          • special_tokens_map.json112 B
          • 1_Pooling
            • config.json190 B

      Show simple item record