Read Me

Paper: Polarity Calibration for Opinion Summarization
Accepted: The 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
Authors: Yuanyuan Lei, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Ruihong Huang, Dong Yu
Paper Link: https://aclanthology.org/2024.naacl-long.291/

Task Description

The paper works on opinion summarization task. Opinion summarization enables automatically generating a summary from a large volume of opinions, reviews, or subjective text. The challenge of opinion summarization lies in presenting diverse or even conflicting opinions. This paper experiments on two types of opinion summarization tasks: summarizing product reviews that contain both positive and negative opinions, and summarizing multiple political articles that hold liberal or conservative political stances.

Dataset Description

We evaluate our approach using two datasets, each focusing on different types of opinions:

AmaSum: the Amazon product reviews summarization dataset (https://github.com/abrazinskas/SelSum/tree/master/data). We use the version that contains a maximum of 100 reviews per product for experiments.
NeuS: the political opinions articles summarization dataset (https://github.com/HLTCHKUST/framing-bias-metric/tree/main/data). The articles with different political stances that discuss the same event are grouped together as a cluster.

Model Generated Summaries

We release the generated summaries from different models:

generated_summary_AmaSum: the generated summaries on AmaSum dataset
- test_file_names: the list of test file names, the model generated summaries are listed in the same order as this list
- base_summarizer_flan_t5_large: the generated summaries from our base summarizer
- calibrated_summarizer_PoCa: the generated summaries from our calibrated summarizer after polarity calibration (PoCa)
- lexrank: the generated summaries from LexRank model
- copycat: the generated summaries from CopyCat model
- bimeanave_avg: the generated summaries from BiMeanVAE-average model
- bimeanave_coop: the generated summaries from BiMeanVAE-COOP model
- qt: the generated summaries from QT model
- semae: the generated summaries from SemAE model
- hercules_abstractive: the generated summaries from Hercules abstractive summarization model
- hercules_extractive: the generated summaries from Hercules extractive summarization model
- gpt_35_turbo: the generated summaries from gpt-3.5-turbo model
- gpt_4: the generated summaries from gpt-4 model
generated_summary_NeuS: the generated summaries on NeuS dataset
- test_data: the test set, the model generated summaries are listed in the order of test_0, test_1, ..., test_306
- base_summarizer_flan_t5_large: the generated summaries from our base summarizer
- calibrated_summarizer_PoCa: the generated summaries from our calibrated summarizer after polarity calibration (PoCa)
- lexrank: the generated summaries from LexRank model
- bart_large: the generated summaries from BART-large baseline
- pegasus_large: the generated summaries from Pegasus-large baseline
- gpt_35_turbo: the generated summaries from gpt-3.5-turbo model
- gpt_4: the generated summaries from gpt-4 model

Code Description

base_summarizer_AmaSum: the code for training our base summarizer on AmaSum dataset
base_summarizer_NeuS: the code for training our base summarizer on NeuS dataset
polarity_calibration_AmaSum: the code for training our calibrated summarizer (PoCa) on AmaSum dataset
polarity_calibration_NeuS: the code for training our calibrated summarizer (PoCa) on NeuS dataset

Citation

If you are going to cite this paper, please use the form:

Yuanyuan Lei, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Ruihong Huang, and Dong Yu. 2024. Polarity Calibration for Opinion Summarization. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5211–5224, Mexico City, Mexico. Association for Computational Linguistics.