This repository contains the data for paper When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
All the data is contained is uploaded to a gogole drive folder.
sample_data.pk
contains the perturbed first paragraph of Wikipedia biographies that were used for the experiments in the paper.all_summaries.pk
contains the generated summaries for the Wikipedia biographies using all the models that we experimented with.data_for_plot.pk
contains the data needed to generate the plots in our paper.
- To load the data, please use the
load_data.ipynb
notebook. It walks through how to load the data as well as how to compute hallucination rates and create the heatmap in the paper. - If you'd like to generate the plots, please follow the
plot.ipynb
notebook.