This project includes the source code for the paper SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization, to appear at ACL 2020.
Highlighted Features
- Unsupervised evaluation metrics: Measure multi-document summaries without using human-written reference summaries
- Unsupervised multi-document summarizer: Using the unsupervised evaluation metrics as rewards to guide a neural reinforcement learning based summarizer to generate summaries. A genetic algorithm based summarizer is also provided, which uses the unsupervised metrics as its fitness function.
Contact person: Yang Gao, yang.gao@rhul.ac.uk
https://sites.google.com/site/yanggaoalex/home
Don't hesitate to send us an e-mail or report an issue, if something is broken or if you have further questions
Given the source documents and some to-be-evaluated summaries, you can produce the unsupervised metrics for the summaries with a few lines of code:
from ref_free_metrics.supert import Supert
from utils.data_reader import CorpusReader
# read docs and summaries
reader = CorpusReader('data/topic_1')
source_docs = reader()
summaries = reader.readSummaries()
# compute the Supert scores
supert = Supert(source_docs)
scores = supert(summaries)
In the example above, it extracts the top-15 sentences from each source document to build the pseudo reference summaries, and rate the summaries by measuring their semantic similarity with the pseudo references.
You can also use the unsupervised metrics as rewards to train a RL-based summarizer to generate summaries:
# read source documents
reader = CorpusReader()
source_docs = reader('data/topic_1')
# generate summaries using reinforcement learning, with supert as reward function
supert = Supert(source_docs)
rl_summarizer = RLSummarizer(reward_func = supert)
summary = rl_summarizer.summarize(source_docs, summ_max_len=100)
# print out the generated summary
print(summary)
You can also use the unsupervised metrics as the fitness function to guide a genetic algorithm to search for the optimal summary. See the example provided in generate_summary_ga.py.
If human-written reference summaries are available (assume they are at data/topic_1/references), you can also evaluate the quality of the generated summary against the references using ROUGE:
refs = reader.readReferences()
for ref in refs:
rouge_scores = evaluate_summary_rouge(summary, ref)
- Prerequisite: Python 3.6 or higher versions
- Install all packages in requirement.txt.
pip3 install -r requirements.txt
- (Optional, if you want to run ROUGE) Download ROUGE-RELEASE-1.5.5.zip from the link, unzip the file and place extracted folder under the rouge directory
mv ROUGE-RELEASE-1.5.5 rouge/
- Branch compare_metrics provides the code for reproducing the results in Tables 1 - 4.
- Branch tac_summarisation provides the code for reproducing the results in Table 5.
Apache License Version 2.0