Prikaži enostavni zapis vnosa

 
dc.contributor.author Žagar, Aleš
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2022-12-21T10:48:47Z
dc.date.available 2022-12-21T10:48:47Z
dc.date.issued 2022-12-21
dc.identifier.uri http://hdl.handle.net/11356/1751
dc.description A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje.
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/
dc.subject text summarization
dc.subject T5
dc.subject graph methods
dc.title Slovenian text summarization models
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
demo.uri https://slovenscina.eu/povzemanje
contact.person Aleš Žagar Ales.Zagar@fri.uni-lj.si Aleš Žagar
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 5
files.size 5178939468


 Datoteke v tem vnosu

Icon
Ime
t5-article.zip
Velikost
271.75 MB
Format
application/zip
Opis
Neznano
MD5
9bdefddcc51118a8fe1758e364bbd537
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • model
    • SloT5-cnndm_slo_pretraining
      • config.json775 B
      • training_args.bin3 kB
      • spiece.model778 kB
      • tokenizer_config.json1 kB
      • special_tokens_map.json1 kB
      • pytorch_model.bin293 MB
Icon
Ime
t5-headline.zip
Velikost
271.41 MB
Format
application/zip
Opis
Neznano
MD5
55f27433a996e2bd231962e4c8c5f548
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • model
    • SloT5-sta_headline
      • config.json772 B
      • training_args.bin3 kB
      • spiece.model778 kB
      • tokenizer_config.json1 kB
      • special_tokens_map.json1 kB
      • pytorch_model.bin293 MB
Icon
Ime
metamodel.zip
Velikost
767.5 MB
Format
application/zip
Opis
Neznano
MD5
2cf60b69ab6b2d8be4565ccbc96a99cb
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • model
    • metamodel
      • model.h515 MB
    • doc2vec
      • model.wv.vectors.npy65 MB
      • model5 MB
      • model.syn1neg.npy65 MB
      • model.dv.vectors.npy676 MB
Icon
Ime
hybrid-long.zip
Velikost
1.91 GB
Format
application/zip
Opis
Neznano
MD5
997f03afa041552290e5dd7f22416942
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • model
    • LaBSE
      • 3_Normalize
        • README.md1 kB
        • tokenizer_config.json491 B
        • pytorch_model.bin1 GB
        • sentence_bert_config.json53 B
        • config.json907 B
        • 2_Dense
          • config.json114 B
          • pytorch_model.bin2 MB
        • tokenizer.json13 MB
        • vocab.txt4 MB
        • config_sentence_transformers.json122 B
        • modules.json461 B
        • special_tokens_map.json112 B
        • 1_Pooling
          • config.json190 B
      • SloT5-cnndm_slo_pretraining
        • config.json775 B
        • training_args.bin3 kB
        • spiece.model778 kB
        • tokenizer_config.json1 kB
        • special_tokens_map.json1 kB
        • pytorch_model.bin293 MB
    Icon
    Ime
    graph-based.zip
    Velikost
    1.64 GB
    Format
    application/zip
    Opis
    Neznano
    MD5
    30888da8d5f758ae0cd0eebc6f0eb1ac
     Prenesi datoteko  Predogled
     Predogled datoteke  
    • model
      • LaBSE
        • 3_Normalize
          • README.md1 kB
          • tokenizer_config.json491 B
          • pytorch_model.bin1 GB
          • sentence_bert_config.json53 B
          • config.json907 B
          • 2_Dense
            • config.json114 B
            • pytorch_model.bin2 MB
          • tokenizer.json13 MB
          • vocab.txt4 MB
          • config_sentence_transformers.json122 B
          • modules.json461 B
          • special_tokens_map.json112 B
          • 1_Pooling
            • config.json190 B

      Prikaži enostavni zapis vnosa