If you use this code in your research, please cite the following publication: https://arxiv.org/abs/2108.12510
@article{gowda2021pulling,
title={Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing},
author={Sindhu C.M. Gowda and Shalmali Joshi and Haoran Zhang and Marzyeh Ghassemi},
journal={arXiv preprint arXiv:2108.12510},
year={2021}
}
Run the following commands to clone this repo and create the Conda environment:
git clone git@github.com:MLforHealth/CausalDA.git
cd CausalDA/
conda env create -f environment.yml
conda activate causalda
See DataSources.md for detailed instructions to setup the WILDS and CXR datasets. This is not necessary for the synthetic experiments.
To train a single model, e.g.
python train_synthetic.py \
--type par_back_front \
--corr-coff 0.75 \
--test-corr 0.75 \
--output_dir /path/to/output
or
python train.py \
--type back \
--data camelyon \
--data_type Conf \
--domains 2 3 \
--corr-coff 0.95 \
--seed 0 \
--output_dir /path/to/output
To reproduce the experiments in the paper by training grids of models, call sweep.py
using the class names defined in experiments.py
as experiment names, e.g.
python sweep.py launch \
--experiment CXR \
--output_dir /my/sweep/output/path \
--command_launcher "local"
This command can also be ran easily using launch_scripts/launch_exp.sh
. You will likely need to update the launcher to fit your compute environment.
We provide sample code for creating aggregate results for an experiment in AggResults.ipynb
.
We make use of code from the WILDS benchmark as well as from the DomainBed framework.
This source code is released under the MIT license, included here.