fladhak/pretraining_biases

Jupyter Notebook

When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

This repository contains the data for paper When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

Downlaod the data

All the data is contained is uploaded to a gogole drive folder.

sample_data.pk contains the perturbed first paragraph of Wikipedia biographies that were used for the experiments in the paper.
all_summaries.pk contains the generated summaries for the Wikipedia biographies using all the models that we experimented with.
data_for_plot.pk contains the data needed to generate the plots in our paper.

Load the data

To load the data, please use the load_data.ipynb notebook. It walks through how to load the data as well as how to compute hallucination rates and create the heatmap in the paper.
If you'd like to generate the plots, please follow the plot.ipynb notebook.