awesome-text-summarization

Text summarization starting from scratch.

This repository will keep updating...

Table of Contents

Basic Concept

Definition

Summarization is the task of producing a shorter version of one or several documents that preserves most of the input's meaning.

Types of summarization

Extractive summaries (extracts) are produced by concatenating several sentences taken exactly as they appear in the materials being summarized.

Abstractive summaries (abstracts), are written to convey the main information in the input and may reuse phrases or clauses from it, but the summaries are overall expressed in the words of the summary author.

Summary Informativeness evaluation

  • ROUGE-N: measures the N-gram units common between a particular summary and a col- lection of reference summaries where N determines the N-gram’s length. E.g., ROUGE-1 for unigrams and ROUGE-2 for bi-grams.
  • ROUGE-L: computes Longest Common Subsequence (LCS) metric.
  • BLUE : BLEU is basically calculated on the n-gram co-occerance between the generated summary and the gold (You don't need to specify the "n" unlike ROUGE).
  • METEOR : based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.

DataSet

  • Annotated English Gigaword

    • for sentence summarization
  • CNN/Daily Mail dataset

    • for document summatization
  • DUC 2004

  • CORNELL NEWSROOM

    • is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
  • Google Dataset

    • Large corpus of uncompressed and compressed sentences from news articles.

Papers

Survey

Recent automatic text summarization techniques:a survey

Automatic summarization

Abstractive Document summarization

1.words-lvt2k-temp-att (Nallapti et al., 2016) : Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

2.Graph-Based Attn : Abstractive Document Summarization with a Graph-Based Attentional Neural Model

3.Pointer-generator + coverage (See et al., 2017) : Get To The Point: Summarization with Pointer-Generator Networks

4.KIGN+Prediction-guide : Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network

5.Explicit Info Selection Modeling(Li et al., 2018a) : Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling

6.Structural Regularization(Li et al., 2018b) : Improving Neural Abstractive Document Summarization with Structural Regularization

7.end2end w/ inconsistency loss (Hsu et al., 2018): A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

8.Pointer + Coverage + EntailmentGen + QuestionGen (Guo et al., 2018) : Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation


Based Reinforcement Learning:

1.ML+RL ROUGE+Novel, with LM (Kryscinski et al., 2018) : Improving Abstraction in Text Summarization

2.RL + pg + cbdec (Jiang and Bansal, 2018): Closed-Book Training to Improve Summarization Encoder Memory

3.rnn-ext + abs + RL + rerank (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

4.ML+RL, with intra-attention : A Deep Reinforced Model for Abstractive Summarization

5.ML+RL ROUGE+Novel, with LM : Improving Abstraction in Text Summarization

6.GAN : Generative Adversarial Network for Abstractive Text Summarization

7.DCA (Celikyilmaz et al., 2018) : Summarization

8.ROUGESal+Ent RL (Pasunuru and Bansal, 2018): Multi-Reward Reinforced Summarization with Saliency and Entailment


Extractive Document summarization

1.TEXTRANK(graph based): TextRank: Bringing Order intoTexts

2.SWAP-NET : Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks

3.NN-SE : [Neural summarization by extracting sentences and words

4.HSASS : A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)

5.NeuSUM (Zhou et al., 2018) : Neural Document Summarization by Jointly Learning to Score and Select Sentences

6.Latent (Zhang et al., 2018) : Neural Latent Extractive Document Summarization

Based Reinforcement Learning

1.rnn-ext + RL (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

2.Bottom-Up Summarization (Gehrmann et al., 2018): Bottom-Up Abstractive Summarization

3.BANDITSUM :BANDITSUM: Extractive Summarization as a Contextual Bandit

4.SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents

5.Refrech: Ranking sentences for extractive summarization with reinforcement learning

6.DQN: Deep reinforcement learning for extractive document summarization:

7.RNES w/o coherence :Learning to Extract Coherent Summary via Deep Reinforcement Learning

Sentence Summarization

1.Re^3 Sum (Cao et al., 2018) : Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization

2.FTSum_g (Cao et al., 2018) : Faithful to the Original: Fact Aware Neural Abstractive Summarization

3.Seq2seq + E2T_cnn (Amplayo et al., 2018) : Abstractive Sentence Summarization with Attentive Recurrent Neural Networks

4.EndDec+WFE (Suzuki and Nagata, 2017) : Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization

5.DRGD (Li et al., 2017) : Deep Recurrent Generative Decoder for Abstractive Text Summarization

6.BiRNN + LM Evaluator (Zhao et al. 2018) : A Language Model based Evaluator for Sentence Compression

Unsupervised Abstractive Summarization

1.MeanSum : MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization

2.Semantic Abstractive Sum based AMR(2018 Dohare): Unsupervised Semantic Abstractive Summarization

3.Paraphrastic Sentence Fusion Model(2018 Nayeem): Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

Multi Document Summarization

1.(Z Cao 2017) : Improving Multi-Document Summarization via Text Classification

2.Based AMR : Abstract Meaning Representation for Multi-Document Summarization.

3 Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion.

4 Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization.

5 Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization.

6 Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization.

7 Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Evaluation Metrics

1.ROUGE(2004) : Rouge: A package for automatic evaluation of summaries

2.BLUE(2002) : BLEU: a Method for Automatic Evaluation of Machine Translation

3.BE(2006) : Automated Summarization Evaluation with Basic Elements

4.Pyramid Method(2007) : Evaluating Content Selection in Summarization: The Pyramid Method

5.(2018 Shaflei) : Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings

6.(2018 Honda) : Pruning Basic Elements for Better Automatic Evaluation of Summaries

Other Resources

awesome-text-summatization :

SOTA in summarizaiton : The current state-of-the-art