An Empirical Survey on Long Document Summarization

Stars

A collection of papers and resources across three principal components of long document summarization: Datasets, Models and Metrics.

Updates

Large language models (LLMs) can seemingly perform a limitless range of tasks, including long document summarization. Nonetheless, we believe significant performance gaps remain when dealing with long texts.

We suggest the following two papers for potential insights:

  • Lost in the Middle: How Language Models Use Long Contexts [Paper]. Despite the buzzy headlines boasting of a 100K or 1B input length limit, one must question the models' actual performance. It would be of limited utility if a model could accept unlimited input (e.g., RNN) but could not effectively reason from it. The paper provides in-depth experiments, yielding numerous insights into the actual capabilities of these models.
  • Unifying Large Language Models and Knowledge Graphs: A Roadmap [Paper]. Although it does not specifically discuss long document summarization, we believe it is highly relevant for addressing the limitations of LLMs across a broad range of tasks.

We maintain that the work conducted on long document summarization thus far remains highly relevant. Practitioners can draw lessons from this research to overcome the challenges arising from long texts.

Going forward, we intend to periodically update this repository for its ongoing relevance in the current era of LLMs. Stay tuned!

Contents

Long Document Summarization

  1. An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics. Huan Yee Koh, Jiaxin Ju, Ming Liu, Shirui Pan. 2022. ACM Comput. Surv. paper
  2. Automatic summarization of scientific articles: A survey Nouf Ibrahim Altmami, Mohamed El Bachir Menai. 2020. Journal of King Saud University - Computer and Information Sciences [paper]

General Automatic Summarization

  1. From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information Shen Gao, Xiuying Chen, Zhaochun Ren, Dongyan Zhao, Rui Yan. 2020. IJCAI [paper]
  2. Neural Abstractive Text Summarization with Sequence-to-Sequence Models Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, Chandan K. Reddy. 2019. ACM/IMS Transactions on Data Science [paper]
  3. Neural Text Summarization: A Critical Evaluation Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher. 2019. EMNLP [paper]
  4. Text Summarization Techniques: A Brief Survey Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut. 2017. IJACSA[paper]
  5. Recent automatic text summarization techniques: a survey Mahak Gambhir, Vishal Gupta. 2017. Artif Intell Rev [paper]
Dataset Year Title tl;dr
arXiv 2018 A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] Scientific
PubMed 2018 A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] Scientific
BigPatent 2019 BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization ACL [Paper] Business/Legal
BillSum 2019 BillSum: A Corpus for Automatic Summarization of US Legislation [Paper] Legislative
TLDR 2020 TLDR: Extreme Summarization of Scientific Documents ACL Findings [Paper] Scientific
CORD-19 2020 CORD-19: The Covid-19 Open Research Dataset ACL NLP-COVID Workshop [Paper] Scientific
FacetSum 2021 Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents [Paper] Scientific
GovReport 2021 Efficient Attentions for Long Document Summarization NAACL [Paper] Legislative
BookSum 2021 BookSum: A Collection of Datasets for Long-form Narrative Summarization [Paper] General Literature
SCROLLS 2022 SCROLLS: Standardized CompaRison Over Long Language Sequences [Paper] Leaderboard
Dataset 2022 SQuALITY: Building a Long-Document Summarization Dataset the Hard Way EMNLP [Paper] Question-focused Summarization
Model Year Title tl;dr
Discourse-RNN 2018 A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] Hierarchical RNN + Sectional Bias
Longformer 2020 Longformer: The Long-Document Transformer [Paper] Transformer + Efficient Attention
BigBird 2020 Big Bird: Transformers for Longer Sequences NeurIPS [Paper] Transformer + Efficient Attention
FacetSum 2021 Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents [Paper] Transformer + Prompt Engineering
GSUM 2021 GSum: A General Framework for Guided Neural Abstractive Summarization [Paper] Transformer + Signal Guidance
CRTLSum 2021 CTRLsum: Towards Generic Controllable Text Summarization [Paper] Transformer + Prompt Engineering
HAT-BART 2021 Hierarchical Learning for Generation with Long Source Sequences [Paper] Transformer + Hierarchical Attention
HEPOS 2021 Efficient Attentions for Long Document Summarization NAACL [Paper] Transformer + Efficient Attention
DeepPyramidion 2022 Sparsifying Transformer Models with Trainable Representation Pooling ACL [Paper] Transformer + Efficient Attention
PRIMERA 2022 PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization ACL [Paper] Transformer + Multi-document Pre-training + Efficient Attention
HIBRIDS 2022 HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization ACL [Paper] Transformer + Discourse Bias Attention
LongT5 2022 LongT5: Efficient Text-To-Text Transformer for Long Sequences NAACL [Paper] Transformer + Long Document Pre-training + Efficient Attention
ECC 2022 Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control NAACL Findings [Paper] Transformer + Factuality-Aware Fine-tuning
PEGASUS-X 2022 Investigating Efficiently Extending Transformers for Long Input Summarization [Paper] Transformer + Efficient Attention
Model Year Title tl;dr
GL-LSTM 2019 Extractive Summarization of Long Documents by Combining Global and Local Context EMNLP-IJCNLP [Paper] Hierarchical RNN + Sectional Bias
Sent-CLF/PTR 2019 On extractive and abstractive neural document summarization with transformer language models EMNLP [Paper] Hierarchical RNN
Topic-GraphSum 2020 Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks COLING [Paper] Graph Attention Network + Topic Modelling
SSN-DM 2021 Sliding Selector Network with Dynamic Memory for Extractive Summarization of Long Documents NAACL [Paper] Memory Network
MemSum 2022 MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes ACL [Paper] RL-based extractor via Multi-step Episodic MDP
HiStruct+ 2022 HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information ACL Findings [Paper] Transformer + Discourse Bias Embeddings
TSTR 2022 TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation NAACL [Paper] Transformer + Signal Guidance
Model Year Title tl;dr
TLM+Ext 2019 Extractive Summarization of Long Documents by Combining Global and Local Context EMNLP-IJCNLP [Paper] Extract-then-Summarize
DANCER 2019 A Divide-and-Conquer Approach to the Summarization of Long Documents IEEE/ACM Transactions on Audio, Speech, and Language Processing [Paper] Summarize-then-Combine
SEAL 2020 SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [Paper] Extract-then-Summarize
LoBART 2021 Long-Span Summarization via Local Attention and Content Selection ACL [Paper] Extract-then-Summarize
DYLE 2022 DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization ACL [Paper] Extract-then-Summarize
Summ^N 2022 Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents ACL [Paper] Summarize-then-Summarize
Model Year Title tl;dr
SciSummPip 2020 Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline Proceedings of the First Workshop on Scholarly Document Processing [Paper] Multi-sentence Compression
Model Year Title tl;dr
PacSum 2019 Sentence Centrality Revisited for Unsupervised Summarization ACL [Paper] Graph Centrality Scoring
HipoRank 2020 Discourse-Aware Unsupervised Summarization of Long Scientific Documents EACL [Paper] Graph Centrality Scoring
FAR 2021 Facet-Aware Evaluation for Extractive Summarization ACL-IJCNLP Findings [Paper] Graph Centrality Scoring
IBsumm 2021 Leveraging Information Bottleneck for Scientific Document Summarization EMNLP Findings [Paper] Pipeline Approach
OTExtSum 2022 OTExtSum: Extractive Text Summarisation with Optimal Transport NAACL Findings [Paper] Optimal Transport Extraction

Metrics listed below are not specific to long document summarization (most summarization metrics are studied under a short/normal summarization setting). Our EMNLP 2022 work investigated these metrics under long document setting: How Far are We from Robust Long Abstractive Summarization?

Metric Year Title tl;dr
ROUGE 2004 ROUGE: A Package for Automatic Evaluation of Summaries ACL [Paper] Hard Lexical Matching
BERTScore 2019 BERTScore: Evaluating Text Generation with BERT ICLR [Paper] Soft Lexical Matching
BARTScore 2021 BARTScore: Evaluating Generated Text as Text Generation NeurIPS [Paper] Conditional Text Generation
Metric Year Title tl;dr
OpenIE 2019 Assessing The Factual Accuracy of Generated Text KDD [Paper] Semantic Matching
FactCC 2019 Evaluating the Factual Consistency of Abstractive Text Summarization EMNLP [Paper] Data Augmentation + Textual Entailment
DAE 2020 Evaluating Factuality in Generation with Dependency-level Entailment EMNLP Findings [Paper] Textual Entailment
FEQA 2020 FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization ACL [Paper] Question-Answering
QAGS 2020 Asking and Answering Questions to Evaluate the Factual Consistency of Summaries ACL [Paper] Question-Answering
TE-MNLI 2020 On Faithfulness and Factuality in Abstractive Summarization ACL [Paper] Textual Entailment
BARTScore 2021 BARTScore: Evaluating Generated Text as Text Generation NeurIPS [Paper] Conditional Text Generation
CoCo 2021 Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation EMNLP Findings [Paper] Causal Inference
Q^2 2021 Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering EMNLP [Paper] Question-Answering
QUAL 2021 Improving Factual Consistency of Abstractive Summarization via Question Answering ACL-IJCNLP [Paper] Question-Answering
SummaC 2021 SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization TACL [Paper] Textual Entailment
FactGraph 2022 FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations NAACL [Paper] Knowledge Graph
FalseSum 2022 Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization NAACL [Paper] Data Augmentation + Textual Entailment
QAFactEval 2022 QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization NAACL [Paper] Question-Answering
MFMA 2022 Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking NAACL Findings [Paper] Data Augmentation + Textual Entailment

Papers that provide insightful discussions related to long document summarization.

Summarization

Topic Year Title tl;dr
Metrics 2020 Re-evaluating Evaluation in Text Summarization EMNLP [Paper] On Effectiveness of Summarization Models and Metrics
Models 2022 DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization ACL [Paper] Insighful Approach+Discussion into Extract-then-Summarize Models
Models 2022 Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization ACL [Paper] Findings of abstractiveness v.s. factuality of abstractive models
Models 2022 Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization ACL [Paper] Hallucinated texts (i.e., facts not in doc) often factual
Models 2022 Training Dynamics for Text Summarization Models ACL [Paper] Fine-tuning affects generation strategies
Models 2022 Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data ACL [Paper] REtrieving from the traINing datA (REINA) leads to Performance Gain
Models 2022 Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models ACL [Paper] Model Efficiency (Training Cost; Train/Inference Speed) vs. Performance
Models 2022 FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization NAACL [Paper] Factuality-Aware PEGASUS Pre-training
Models 2022 Are Abstractive Summarization Models truly ‘Abstractive’? [Paper] On Abstractiveness of Abstractive Model
Metrics 2022 How Far are We from Robust Long Abstractive Summarization? [Paper] On Metrics under Long Document Summarization Setting

General

Topic Year Title tl;dr
Efficient Attention 2020 Efficient Transformer: A Survey ACM Comput. Surv. [Paper] Practicality of Efficient Attention (section 4.4)
Fine-tuning 2022 A Closer Look at How Fine-tuning Changes BERT ACL [Paper] How Representation Changes after Fine-tuning
Text Generation 2022 Language modeling via stochastic processes ICLR [Paper] Text Generation via Latent Stochastic Process
Efficient Attention 2022 Simple Local Attentions Remain Competitive for Long-Context Tasks NAACL [Paper] Local Window Attention remains Competitive
Scaling Language Model 2022 Improving language models by retrieving from trillions of tokens ICML [Paper] Retrieve from Two Trillion Tokens Database, then Generate (See REINA above)

Our survey work has been accepted by ACM Computing Surveys. For more information, please visit An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics.

Folders for metrics used to analyze intrinsic characteristics of dataset, metrics used to analyze model outputs and arXiv human annotated data are available. Will be uploaded soon. Please do not hesitate to contact me.

@article{10.1145/3545176,
author = {Koh, Huan Yee and Ju, Jiaxin and Liu, Ming and Pan, Shirui},
title = {An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics},
year = {2022},
month = {jun},
publisher = {Association for Computing Machinery},
journal = {ACM Comput. Surv.},
address = {New York, NY, USA},
issn = {0360-0300},
url = {https://doi.org/10.1145/3545176},
doi = {10.1145/3545176},
keywords = {datasets, neural networks, document summarization, language models, Transformer}
}