Code accompanying the paper Improving Placebo Selection in Survey Experiments by
Charles Crabtree and William W. Marx.
How should researchers create placebo conditions in survey experiments? Porter and Velez (2021) recommend that researchers use automated processes to construct a large corpus of placebo conditions, which are then randomly assigned to participants in the placebo group during survey implementation. Based on our empirical work, we suggest that researchers use caution if they employ the tool recommended for placebo construction, OpenAI’s semi-supervised language model GPT-2. We conduct the most extensive assessment of GPT-2’s biases by measuring the sentiment of 1,083,750 potential placebos generated across 4,335 unique seed phrases. We show that the polarities of placebos vary tremendously across seed phrases, depending on the race/ethnicity, gender, sexual and religious orientation, political affiliation, political ideology, or state or territory demonym mentioned in them. We also show considerable heterogeneity in the substance of placebos across seed phrases. Comparing our results to a similar analysis of the text corpus used to train GPT-2, we find that the language model not only learns to reproduce biases in source material but also magnifies them. To deal with these issues, we provide a tool to mitigate the effects of these biases and provide best practice recommendations for automatic placebo generation.
To install all necessary requirements, run bash install.sh
. We recommend using Docker, but native
installs for macOS and GNU/Linux are similarly available through the interactive install script.
If not running via Docker, the following items will be installed if not already extant:
To reproduce our results including CSV file with all generated placebos, CSV file with all placebo
sentiments, CSV analysis file with basic stats and all graphs, simply run make reproduce
.
To reproduce all results as described above with the same seed phrases we used, run make all
.
To generate results with a different set of seed phrases, run
python3 generate.py [SEED PHRASE FILE]
. You will then have to modify compile.py
,
analyze.py
and visualize.py
, as they rely on the
data/group_identifiers.json
file which is tailored to the seed
phrases we used.
To reproduce our OpenWebText sentiment analysis findings, run make openwebtext
.
See ARCHITECTURE.md
See CONTRIBUTING.md