Controllable Text Summarization Survey

This repository contains the controllable text summarization (CTS) survey papers and is based on our paper, "Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey"

You can cite our paper as the following

@misc{urlana2023controllable,
      title={Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey}, 
      author={Ashok Urlana and Pruthwik Mishra and Tathagato Roy and Rahul Mishra},
      year={2023},
      eprint={2311.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

We group the papers according to the controllable aspects as Length, Coverage, Style, Abstractivity, Salience, Entity, Topic, Role, Diversity, Structure.

Length

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
Abstractive Document Summarization with Summary-length Prediction EACL-2023	CNNDM, NYT, WikiHow
Length Control in Abstractive Summarization by Pretraining Information Selection ACL-2022 code	CNN-DailyMail, XSUM
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization EMNLP-2022 code	DUC2004
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization Neurips-2022 code	Gigaword, DUC2004
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code	CNNDM, arXiv, BIGPATENT
A New Approach to Overgenerating and Scoring Abstractive Summaries NAACL-2021 code data	Gigaword, Newsroom
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Lenatten: An effective length controlling unit for text summarization ACL-2021 code	CNNDM
Interpretable multi headed attention for abstractive summarization at controllable lengths COLING-2020	MSR Narratives and Thinking-Machines
Positional Encoding to Control Output Sequence Length NAACL-2019 code	JAMUS corpus (Japanese) of different number of characters present in the summary
Global Optimization under Length Constraint for Neural Text Summarization ACL-2019	CNNDM, Mainichi
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation INLG-2019 data	JAMUS corpus (Japanese) of different number of characters present in the summary
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail
Unsupervised Sentence Compression using Denoising Auto-Encoders CoNLL-2018 code	Gigaword
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network EMNLP-2018 code	CNNDM, DMQA
Controlling Output Length in Neural Encoder-Decoders EMNLP-2016 code	DUC2004, Gigaword
A Neural Attention Model for Abstractive Sentence Summarization EMNLP-2015	NYT, DUC2004

Coverage

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
SWING : Balancing Coverage and Faithfulness for Dialogue Summarization EACL-2023 code	DIALOG-SUM, SAMSUM
Unsupervised Multi-Granularity Summarization EMNLP-2022 data	GranuDUC, MultiNews, DUC2004, Arxiv
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS-2022 code data	Multi-LexSum
Controllable Abstractive Dialogue Summarization with Sketch Supervision ALC-IJCNLP-2021 code	SAMSum
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization AAAI-2020 data	Gigaword, DUC2004 and MSR abstractive summarization dataset
Get to the point: Summarization with pointer generator networks ACL-2017 code	CNNDM

Style

Paper	Datasets Used
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles ACL-BIoNLP(W)-2023	PLOS and eLife
Generating Summaries with Controllable Readability Levels EMNLP-2023 code	CNNDM
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models EMNLP-2022 code	CNN Daily Mail, XSUM, Newsroom
Readability Controllable Biomedical Document Summarization EMNLP-2022 data	TS and PLS
Inference time style control for summarization NAACL-2021 code	CNNDM
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles ACL-2020 code	NYT, CNN
Generating Formality-tuned Summaries Using Input-dependent Rewards CoNLL-2019	CNN Daily Mail + Webis-TLDR-17 corpus
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail

Abstractivity

Paper	Datasets Used
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Controlling the Amount of Verbatim Copying in Abstractive Summarization AAAI-2020 code	Gigaword, Newsroom
Improving Abstraction in Text Summarization EMNLP-2018	CNNDM
Get to the point: Summarization with pointer generator networks ACL-2017 code	CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code	CNN/DM, DUC2002

Salience

Paper	Datasets Used
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL-2023	CNNDM, XSUM, NYTimes
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code	QMSum and SQuALITY
Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network NAACL-HLT-2018	CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code	CNN/DM, DUC2002

Entity

Paper	Datasets Used
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code	QMSum and SQuALITY
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders AACL-2022	EntSum
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code	CNNDM, arXiv, BIGPATENT
ENTSUM: A Data Set for Entity-Centric Summarization ACL-2022 code data	CNNDM, NYT
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Controllable Neural Dialogue Summarization with Personal Named Entity Planning EMNLP-2021 code	SAMSum
Controllable Abstractive Sentence Summarization with Guiding Entities COLING-2020 code	Gigaword, DUC2004
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail

Topic

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
Topic-aware Multimodal Summarization AACL-2022 code data	MSMO
NEWTS: A Corpus for News Topic-Focused Summarization ACL-2022 data	NEWTS
ASPECTNEWS: Aspect-Oriented Summarization of News Documents ACL-2022 code data	ASPECTNEWS
Aspect-controllable opinion summarization EMNLP-2021 code	SPACE, OPOSUM+
Decision-Focused Summarization EMNLP-2021 code data	Yelp's businesses, reviews, and user data
CATS: Customizable Abstractive Topic-based Summarization ACM-2021 code	CNNDM, AMI , ICSI, ADSE
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization TACL-2021 code data	WikiAsp
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach EMNLP-2020 code	CNN -Dailymail, MA News, All the News
OPINIONDIGEST: A Simple Framework for Opinion Summarization ACL-2020 code	Hotel, Yelp
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews SIGIR-2020 code data	Tourism Reviews
Generating topic-oriented summaries using neural attention NAACL-HLT-2018	CNNDM
Vocabulary Tailored Summary Generation ACL-2018	CNNDM

Role

Paper	Datasets Used
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL-2022 code data	CSDS, MC
Towards Modeling Role-Aware Centrality for Dialogue Summarization AACL-2022 data	CSDS, MC
CSDS: A fine-grained Chinese dataset for customer service dialogue summarization EMNLP-2021 code data	CSDS

Diversity

Paper	Datasets Used
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation ACL-2022 code	CNN/DailyMail and Xsum and question generation (SQuAD)

Structure

Paper	Datasets Used
STRONG – Structure Controllable Legal Opinion Summary Generation IJCNLP-AACL-2023	CanLII
SentBS: Sentence-level beam search for controllable summarization EMNLP-2022 code	Meta Review Dataset (MReD)
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation ACL-2022 code data	MReD
Planning with Learned Entity Prompts for Abstractive Summarization TACL-2021	CNN/DailyMail, XSum, SAMSum, and BillSum

ashokurlana/controllable_text_summarization_survey