This is the code for the paper Are All Spurious Correlations in Natural Language Alike? An Analysis through a Causal Lens which was accepted at EMNLP 2022. This code was majorly adapated from three different sources:
- For the real NLP experiments - https://github.com/technion-cs-nlp/bias-probing
- For the INLP experiments - https://github.com/shauli-ravfogel/nullspace_projection
- For the toy synthetic experiments - https://github.com/cjlovering/predicting-inductive-biases
Some of the instructions are common with the instructions from the original codebase.
Create a new conda environment and install libraries:
pip3 install -r requirements.txt
Most of the NLP datasets are already prepared and our available at nlp/data
. Some which were too large to upload (MNLI training data and MNLI synthetic training data) are uploaded here. Overall, these datasets include the MNLI training dataset, MNLI synthetic training datasets, as well as the training and evaluation datasets corresponding to the different groups (e.g. high word overlap).
To train models either using a simple cross entropy loss on orginal/balanced dataset, or using POE/DFL, navigate to nlp/scripts
and run:
python3 training.py
You can modify the training dataset, method, hyperaparameters etc. in training.yaml
.
To run the probing, you can check out the different configs in nlp/configs
. You can then run:
python3 probing.py --seed=42 --name="baseline" --task_config_file="mnli_lex_class.json" --model_name_or_path="seed:42/baseline" --overwrite_cache
To obtain an invariant representation using the INLP method, you can run:
python3 debias.py
The file includes path to the saved model representations which will be debiased -- you will need to modify this according to where you store it.
To be updated soon!
@inproceedings{joshi2022spurious,
author={Nitish Joshi, Xiang Pan and He He},
title={Are All Spurious Features in Natural Language Alike? An Analysis
through a Causal Lens},
booktitle={EMNLP},
year={2022}
}
Nitish Joshi, Xiang Pan and He He