/DiverseSumm

Code and data for the NAACL 2024 paper "Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles"

Primary LanguagePythonApache License 2.0Apache-2.0

[NAACL 2024] Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

This repo hosts the code and data for the NAACL 2024 paper: Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles.

Contact

If you have any questions about this work, please contact Steeve Huang.

Abstract

Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, to our knowledge, the summarization of diverse information dispersed across multiple articles about an event has not been previously investigated. The latter imposes a different set of challenges for a summarization model. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Moreover, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of the summaries, as well as their correlation with human assessments. We applied our findings to study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover less than 40% of the diverse information on average.


DiverseSumm

We release the data for our DiverseSumm dataset in the data folder.

Data Format

Below shows the format of an instance in DiverseSumm:

{
  "eid" : "...",
  "articles": [
    {
      "aid": "...",
      "title: "...",
      "url": "...",
      "domain": "...",
      "content": "..."
    },
  ],
  "question_answers":[
    {
      "question": "...",
      "answer_groups":[
        [
          {
            "aid": "...",
            "answer": "..."
          },
        ],
        [
          ...
        ]
        
      ]
    },
  ]

}

Fields

  • eid: A unique string identifier for the event.
  • articles: A list of articles, each of which is represented as a dictionary.
    • aid: An identifier for each individual article within an event.
    • title: The headline or title of the given article.
    • url: The original link to the online version of the article.
    • domain: The website or news source where the article was originally published.
    • content: The full text content of the article.
  • question_answers: A list of dictionary entries each containing a question and various groups of answers, relating to that question, from different articles.
    • question: A specific question about the event that the result in diverse answers across different articles.
    • answer_groups: A list of lists where each sub-list represents a group of answers. Each answer may come from a different article about the event.
      • answer: The text of the answer giving information about the event. It answers the question from the same dictionary entry and is extracted from the article with the corresponding aid.

Code

The example code for prompting gpt-3.5-turbo and gpt-4 are in the scripts folder.

Citation

If you find this work useful, please consider citing:

@inproceedings{huang-etal-2024-embrace,
    title = "Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles",
    author = "Huang, Kung-Hsiang  and
      Laban, Philippe  and
      Fabbri, Alexander  and
      Choubey, Prafulla Kumar  and
      Joty, Shafiq  and
      Xiong, Caiming  and
      Wu, Chien-Sheng",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.32",
    pages = "570--593",
}