When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

This repository contains the data for paper When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

Downlaod the data

All the data is contained is uploaded to a gogole drive folder.

  • sample_data.pk contains the perturbed first paragraph of Wikipedia biographies that were used for the experiments in the paper.
  • all_summaries.pk contains the generated summaries for the Wikipedia biographies using all the models that we experimented with.
  • data_for_plot.pk contains the data needed to generate the plots in our paper.

Load the data

  • To load the data, please use the load_data.ipynb notebook. It walks through how to load the data as well as how to compute hallucination rates and create the heatmap in the paper.
  • If you'd like to generate the plots, please follow the plot.ipynb notebook.