Controllable Text Summarization Survey
This repository contains the controllable text summarization (CTS) survey papers and is based on our paper, "Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey"
You can cite our paper as the following
@misc{urlana2023controllable,
title={Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey},
author={Ashok Urlana and Pruthwik Mishra and Tathagato Roy and Rahul Mishra},
year={2023},
eprint={2311.09212},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We group the papers according to the controllable aspects as Length , Coverage , Style , Abstractivity , Salience , Entity , Topic , Role , Diversity , Structure .
Paper
Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data
CNN Daily Mail, QMSum
Abstractive Document Summarization with Summary-length Prediction EACL-2023
CNNDM, NYT, WikiHow
Length Control in Abstractive Summarization by Pretraining Information Selection ACL-2022 code
CNN-DailyMail, XSUM
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization EMNLP-2022 code
DUC2004
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization Neurips-2022 code
Gigaword, DUC2004
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code
CNNDM, arXiv, BIGPATENT
A New Approach to Overgenerating and Scoring Abstractive Summaries NAACL-2021 code data
Gigaword, Newsroom
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code
CNNDM, Newsroom, DUC-2002
Lenatten: An effective length controlling unit for text summarization ACL-2021 code
CNNDM
Interpretable multi headed attention for abstractive summarization at controllable lengths COLING-2020
MSR Narratives and Thinking-Machines
Positional Encoding to Control Output Sequence Length NAACL-2019 code
JAMUS corpus (Japanese) of different number of characters present in the summary
Global Optimization under Length Constraint for Neural Text Summarization ACL-2019
CNNDM, Mainichi
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation INLG-2019 data
JAMUS corpus (Japanese) of different number of characters present in the summary
Controllable Abstractive Summarization ACL-NMT(W)-2018
CNN-DailyMail
Unsupervised Sentence Compression using Denoising Auto-Encoders CoNLL-2018 code
Gigaword
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network EMNLP-2018 code
CNNDM, DMQA
Controlling Output Length in Neural Encoder-Decoders EMNLP-2016 code
DUC2004, Gigaword
A Neural Attention Model for Abstractive Sentence Summarization EMNLP-2015
NYT, DUC2004
Paper
Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data
CNN Daily Mail, QMSum
SWING : Balancing Coverage and Faithfulness for Dialogue Summarization EACL-2023 code
DIALOG-SUM, SAMSUM
Unsupervised Multi-Granularity Summarization EMNLP-2022 data
GranuDUC, MultiNews, DUC2004, Arxiv
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS-2022 code data
Multi-LexSum
Controllable Abstractive Dialogue Summarization with Sketch Supervision ALC-IJCNLP-2021 code
SAMSum
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization AAAI-2020 data
Gigaword, DUC2004 and MSR abstractive summarization dataset
Get to the point: Summarization with pointer generator networks ACL-2017 code
CNNDM
Paper
Datasets Used
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles ACL-BIoNLP(W)-2023
PLOS and eLife
Generating Summaries with Controllable Readability Levels EMNLP-2023 code
CNNDM
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models EMNLP-2022 code
CNN Daily Mail, XSUM, Newsroom
Readability Controllable Biomedical Document Summarization EMNLP-2022 data
TS and PLS
Inference time style control for summarization NAACL-2021 code
CNNDM
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles ACL-2020 code
NYT, CNN
Generating Formality-tuned Summaries Using Input-dependent Rewards CoNLL-2019
CNN Daily Mail + Webis-TLDR-17 corpus
Controllable Abstractive Summarization ACL-NMT(W)-2018
CNN-DailyMail
Paper
Datasets Used
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code
CNNDM, Newsroom, DUC-2002
Controlling the Amount of Verbatim Copying in Abstractive Summarization AAAI-2020 code
Gigaword, Newsroom
Improving Abstraction in Text Summarization EMNLP-2018
CNNDM
Get to the point: Summarization with pointer generator networks ACL-2017 code
CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code
CNN/DM, DUC2002
Paper
Datasets Used
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL-2023
CNNDM, XSUM, NYTimes
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code
QMSum and SQuALITY
Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network NAACL-HLT-2018
CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code
CNN/DM, DUC2002
Paper
Datasets Used
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code
QMSum and SQuALITY
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders AACL-2022
EntSum
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code
CNNDM, arXiv, BIGPATENT
ENTSUM: A Data Set for Entity-Centric Summarization ACL-2022 code data
CNNDM, NYT
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code
CNNDM, Newsroom, DUC-2002
Controllable Neural Dialogue Summarization with Personal Named Entity Planning EMNLP-2021 code
SAMSum
Controllable Abstractive Sentence Summarization with Guiding Entities COLING-2020 code
Gigaword, DUC2004
Controllable Abstractive Summarization ACL-NMT(W)-2018
CNN-DailyMail
Paper
Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data
CNN Daily Mail, QMSum
Topic-aware Multimodal Summarization AACL-2022 code data
MSMO
NEWTS: A Corpus for News Topic-Focused Summarization ACL-2022 data
NEWTS
ASPECTNEWS: Aspect-Oriented Summarization of News Documents ACL-2022 code data
ASPECTNEWS
Aspect-controllable opinion summarization EMNLP-2021 code
SPACE, OPOSUM+
Decision-Focused Summarization EMNLP-2021 code data
Yelp's businesses, reviews, and user data
CATS: Customizable Abstractive Topic-based Summarization ACM-2021 code
CNNDM, AMI , ICSI, ADSE
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization TACL-2021 code data
WikiAsp
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach EMNLP-2020 code
CNN -Dailymail, MA News, All the News
OPINIONDIGEST: A Simple Framework for Opinion Summarization ACL-2020 code
Hotel, Yelp
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews SIGIR-2020 code data
Tourism Reviews
Generating topic-oriented summaries using neural attention NAACL-HLT-2018
CNNDM
Vocabulary Tailored Summary Generation ACL-2018
CNNDM
Paper
Datasets Used
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL-2022 code data
CSDS, MC
Towards Modeling Role-Aware Centrality for Dialogue Summarization AACL-2022 data
CSDS, MC
CSDS: A fine-grained Chinese dataset for customer service dialogue summarization EMNLP-2021 code data
CSDS
Paper
Datasets Used
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation ACL-2022 code
CNN/DailyMail and Xsum and question generation (SQuAD)
Paper
Datasets Used
STRONG – Structure Controllable Legal Opinion Summary Generation IJCNLP-AACL-2023
CanLII
SentBS: Sentence-level beam search for controllable summarization EMNLP-2022 code
Meta Review Dataset (MReD)
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation ACL-2022 code data
MReD
Planning with Learned Entity Prompts for Abstractive Summarization TACL-2021
CNN/DailyMail, XSum, SAMSum, and BillSum