Experiment Repository for "Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions"
This repository contains the code to reproduce the experiments and figures for the paper "Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions", by JL. Gamella, A. Taeb, C. Heinze-Deml and P. Bühlmann. This README is not intended to be completely self-explanatory, and should be read alongside the manuscript.
If you find this code useful, please consider citing:
@article{gamella2022characterization,
title={Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions},
author={Gamella, Juan L. and Taeb, Armeen and Heinze-Deml, Christina and B\"uhlmann, Peter},
year={2022}
}
This repository contains only the code to reproduce the results from the paper. If you're interested in using the GnIES algorithm described in the paper for your own work, it is available as a separate and well-documented python package called gnies
. You can find more information on its own repository at github.com/juangamella/gnies.
We use the following python packages for the other algorithms:
ges
for the python implementation of the GES algorithmgies
for the python implementation of the GIES algorithmcausaldag
for UT-IGSP (see an example and our wrapper including KCI tests)
Additionally, we use the python package sempler
to generate synthetic and semi-synthetic data for our experiments.
We ran our experiments using python=3.9.9
, but they should work on any version above 3.8
. The Python dependencies live in requirements.txt
. For your convenience, a makefile is included to create a python virtual environment and install the necessary Python dependencies. To do this, simply run
make venv
and then
source venv/bin/activate
to activate the virtual environment. Of course, you will need to be in a "make-capable" system (e.g. linux), and where you can invoke the python venv
module. To run the notebooks from the virtual environment, create a new local kernel (while the environment is active):
ipython kernel install --user --name=.venv
and once inside the notebook select the kernel: Kernel -> Change kernel -> .venv
.
Below are the exact instructions to reproduce all the experiments and figures used in the paper. Please note that, without access to a HPC cluster, completion of the experiments may take days or weeks. We ran our experiments on the Euler cluster of ETH Zürich - see the files run_synthetic_experiments_cluster.sh
and run_sachs_experiments_cluster.sh
for details (i.e. number of cores, expected completion time, etc).
We include all the datasets required to reproduce the experiments; the code to re-generate them can also be found in the files run_synthetic_experiments.sh
and run_sachs_experiments.sh
. Although not necessary, if you wish to re-generate the semi-synthetic datasets you will need some additional R
dependencies (see sempler's documentation).
- Download and unpack the synthetic datasets
./download_synthetic_datasets.sh
- Make sure the python environment is active (see above) and run the methods using the corresponding script. By default this will use a total of 4 threads (cores) to run the experiments; the number of threads can be set by editing the script and setting the variable
N_THREADS
to the desired value.
./run_synthetic_experiments.sh
- The results are stored in the
synthetic_experiments/
under the sub-directory corresponding to each dataset. - To generate the figures, use notebooks
figure_model_match.ipynb
andfigure_model_mismatch.ipynb
. The resulting figures are stored in thefigures/
directory.
The procedure is similar to the synthetic experiments:
- Download and unpack the Sachs dataset and the hybrid data
./download_sachs_datasets.sh
- Make sure the python environment is active (see above) and run the methods using the corresponding script. By default this will use a total of 4 threads (cores) to run the experiments; the number of threads can be set by editing the script and setting the variable
N_THREADS
to the desired value.
./run_sachs_experiments.sh
- The results are stored in the
sach_experiments/
under the sub-directory corresponding to each dataset. - To generate the figures, use notebook
figures_sach_experiments.ipynb
. The resulting figures are stored in thefigures/
directory.
You will find the following/files directories:
src/
: contains the Python code to run the experiments. Each baseline is executed from its own python script:src/run_gnies.py
for GnIESsrc/run_utigsp.py
for UT-IGSPsrc/run_ges.py
for GESsrc/run_gies.py
for GIESsrc/run_sortnregress.py
for sortnregress
*_experiments
directories hold the datasets and the results from executing the experiments.figure*.ipynb
are the jupyter notebooks used to generate the figures used in the paper. Figures are stored in thefigures/
directory.
If you need assistance or have feedback, you are more than welcome to write me an email :)