/controllable_text_summarization_survey

This repository contains the controllable text summarization (CTS) survey papers

MIT LicenseMIT

Controllable Text Summarization Survey


This repository contains the controllable text summarization (CTS) survey papers and is based on our paper, "Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey"

You can cite our paper as the following

@misc{urlana2023controllable,
      title={Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey}, 
      author={Ashok Urlana and Pruthwik Mishra and Tathagato Roy and Rahul Mishra},
      year={2023},
      eprint={2311.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

We group the papers according to the controllable aspects as Length, Coverage, Style, Abstractivity, Salience, Entity, Topic, Role, Diversity, Structure.

Length

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
Abstractive Document Summarization with Summary-length Prediction EACL-2023 CNNDM, NYT, WikiHow
Length Control in Abstractive Summarization by Pretraining Information Selection ACL-2022 code CNN-DailyMail, XSUM
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization EMNLP-2022 code DUC2004
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization Neurips-2022 code Gigaword, DUC2004
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code CNNDM, arXiv, BIGPATENT
A New Approach to Overgenerating and Scoring Abstractive Summaries NAACL-2021 code data Gigaword, Newsroom
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Lenatten: An effective length controlling unit for text summarization ACL-2021 code CNNDM
Interpretable multi headed attention for abstractive summarization at controllable lengths COLING-2020 MSR Narratives and Thinking-Machines
Positional Encoding to Control Output Sequence Length NAACL-2019 code JAMUS corpus (Japanese) of different number of characters present in the summary
Global Optimization under Length Constraint for Neural Text Summarization ACL-2019 CNNDM, Mainichi
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation INLG-2019 data JAMUS corpus (Japanese) of different number of characters present in the summary
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail
Unsupervised Sentence Compression using Denoising Auto-Encoders CoNLL-2018 code Gigaword
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network EMNLP-2018 code CNNDM, DMQA
Controlling Output Length in Neural Encoder-Decoders EMNLP-2016 code DUC2004, Gigaword
A Neural Attention Model for Abstractive Sentence Summarization EMNLP-2015 NYT, DUC2004

Coverage

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
SWING : Balancing Coverage and Faithfulness for Dialogue Summarization EACL-2023 code DIALOG-SUM, SAMSUM
Unsupervised Multi-Granularity Summarization EMNLP-2022 data GranuDUC, MultiNews, DUC2004, Arxiv
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS-2022 code data Multi-LexSum
Controllable Abstractive Dialogue Summarization with Sketch Supervision ALC-IJCNLP-2021 code SAMSum
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization AAAI-2020 data Gigaword, DUC2004 and MSR abstractive summarization dataset
Get to the point: Summarization with pointer generator networks ACL-2017 code CNNDM

Style

Paper Datasets Used
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles ACL-BIoNLP(W)-2023 PLOS and eLife
Generating Summaries with Controllable Readability Levels EMNLP-2023 code CNNDM
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models EMNLP-2022 code CNN Daily Mail, XSUM, Newsroom
Readability Controllable Biomedical Document Summarization EMNLP-2022 data TS and PLS
Inference time style control for summarization NAACL-2021 code CNNDM
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles ACL-2020 code NYT, CNN
Generating Formality-tuned Summaries Using Input-dependent Rewards CoNLL-2019 CNN Daily Mail + Webis-TLDR-17 corpus
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail

Abstractivity

Paper Datasets Used
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Controlling the Amount of Verbatim Copying in Abstractive Summarization AAAI-2020 code Gigaword, Newsroom
Improving Abstraction in Text Summarization EMNLP-2018 CNNDM
Get to the point: Summarization with pointer generator networks ACL-2017 code CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code CNN/DM, DUC2002

Salience

Paper Datasets Used
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL-2023 CNNDM, XSUM, NYTimes
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code QMSum and SQuALITY
Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network NAACL-HLT-2018 CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code CNN/DM, DUC2002

Entity

Paper Datasets Used
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code QMSum and SQuALITY
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders AACL-2022 EntSum
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code CNNDM, arXiv, BIGPATENT
ENTSUM: A Data Set for Entity-Centric Summarization ACL-2022 code data CNNDM, NYT
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Controllable Neural Dialogue Summarization with Personal Named Entity Planning EMNLP-2021 code SAMSum
Controllable Abstractive Sentence Summarization with Guiding Entities COLING-2020 code Gigaword, DUC2004
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail

Topic

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
Topic-aware Multimodal Summarization AACL-2022 code data MSMO
NEWTS: A Corpus for News Topic-Focused Summarization ACL-2022 data NEWTS
ASPECTNEWS: Aspect-Oriented Summarization of News Documents ACL-2022 code data ASPECTNEWS
Aspect-controllable opinion summarization EMNLP-2021 code SPACE, OPOSUM+
Decision-Focused Summarization EMNLP-2021 code data Yelp's businesses, reviews, and user data
CATS: Customizable Abstractive Topic-based Summarization ACM-2021 code CNNDM, AMI , ICSI, ADSE
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization TACL-2021 code data WikiAsp
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach EMNLP-2020 code CNN -Dailymail, MA News, All the News
OPINIONDIGEST: A Simple Framework for Opinion Summarization ACL-2020 code Hotel, Yelp
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews SIGIR-2020 code data Tourism Reviews
Generating topic-oriented summaries using neural attention NAACL-HLT-2018 CNNDM
Vocabulary Tailored Summary Generation ACL-2018 CNNDM

Role

Paper Datasets Used
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL-2022 code data CSDS, MC
Towards Modeling Role-Aware Centrality for Dialogue Summarization AACL-2022 data CSDS, MC
CSDS: A fine-grained Chinese dataset for customer service dialogue summarization EMNLP-2021 code data CSDS

Diversity

Paper Datasets Used
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation ACL-2022 code CNN/DailyMail and Xsum and question generation (SQuAD)

Structure

Paper Datasets Used
STRONG – Structure Controllable Legal Opinion Summary Generation IJCNLP-AACL-2023 CanLII
SentBS: Sentence-level beam search for controllable summarization EMNLP-2022 code Meta Review Dataset (MReD)
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation ACL-2022 code data MReD
Planning with Learned Entity Prompts for Abstractive Summarization TACL-2021 CNN/DailyMail, XSum, SAMSum, and BillSum