/causal-chamber

Dataset repository for the 2024 paper "The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology" by Juan L. Gamella, Jonas Peters and Peter Bühlmann.

Primary LanguagePythonMIT LicenseMIT

The Causal Chambers: Dataset Repository

The Causal Chambers: (left) the wind tunnel, and (right) the light tunnel with the front panel removed to show its interior.

This repository contains datasets collected from the causal chambers, the two devices described in the 2024 paper The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology by Juan L. Gamella, Jonas Peters and Peter Bühlmann. The repository is updated as we collect new datasets from the chambers.

The datasets are publicly available through a permissive CC BY 4.0 license. This means you are free to use, share and modify the datasets as long as you give appropriate credit and communicate changes. If you use the datasets in your scientific work, please consider citing:

@article{gamella2024chamber,
  title={The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology},
  author={Gamella, Juan L. and B\"uhlmann, Peter and Peters, Jonas},
  journal={arXiv preprint arXiv:2404.11341},
  year={2024}
}

This repository also contains the source code for the causalchamber package to directly import the datasets into your Python code. The package also provides Python implementations of the mechanistic models described in appendix IV of the original paper.

Here you can also find the resources to build the chambers, and the datasheets for all chamber components (see hardware/).

The code to reproduce the case studies in the original paper can be found in the separate paper repository.

Available datasets

We are open to suggestions of additional experiments that may prove interesting; please reach out to the corresponding author.

Each dataset below is described in detail in its corresponding page (click the dataset name). The chamber configurations are described in Fig. 3 of the manuscript.

Dataset name Notes Chamber Config.
lt_camera_walks_v1 Image data for the ICA case study (task d3, Fig. 6). Light tunnel camera
lt_color_regression_v1 Image data for task b2 in the OOD case study (Fig. 5) Light tunnel camera
lt_interventions_standard_v1 Observational and interventional data from the light tunnel, used for the causal discovery case study in Fig. 5. Light tunnel standard
lt_walks_v1 Random and deterministic walks of the light-tunnel actuators. Used in the ICA case study (task d1), Fig. 6. Light tunnel standard
wt_walks_v1 Random and deterministic walks of the wind-tunnel actuators. Used in the causal discovery (task a3) and ICA (task d2) case studies. Wind tunnel standard
lt_malus_v1 Measurements of light intensity displaying Malus' law, used in the symbolic regression task in Fig. 6e. Light tunnel standard
wt_bernoulli_v1 Measurements of air pressure displaying Bernoulli's principle, used in the symbolic regression task in Fig. 6e. Wind tunnel standard
wt_changepoints_v1 Used for the change point detection case study in Fig. 5. Wind tunnel standard
wt_intake_impulse_v1 Barometric pressure curves used in task 2c, Fig. 5. Wind tunnel standard
wt_pressure_control_v1 Data from the pressure-control configuration of the wind tunnel. Wind tunnel pressure-control
lt_test_v1 Experiments to characterize some of the physical effects of the light tunnel. Shown in figures 7-15 of the manuscript. Light tunnel standard
wt_test_v1 Experiments to characterize some of the physical effects of the wind tunnel. Shown in figures 7-15 of the manuscript. Wind tunnel standard
lt_camera_test_v1 Experiments to characterize some of the physical effects of the camera system in the light tunnel. Light tunnel camera
wt_validate_v1 Randomized control experiments to validate the causal ground-truth graph of the wind tunnel in its standard configuration (appendix V of the manuscript). Wind tunnel standard
wt_pc_validate_v1 Randomized control experiments to validate the causal ground-truth graph of the wind tunnel in its pressure-control configuration (appendix V of the manuscript). Wind tunnel pressure-control
lt_validate_v1 Randomized control experiments to validate the causal ground-truth graphs of the light tunnel in its standard configuration (appendix V of the manuscript). Light tunnel standard
lt_camera_validate_v1 Randomized control experiments to validate the causal ground-truth graphs of the light tunnel in its camera configuration (appendix V of the manuscript). Light tunnel standard
lt_camera_v1 Image datasets where the light-tunnel actuators are sampled from different distributions and structural causal models. Light tunnel camera

Downloading the datasets

For each dataset, you can simply download a .zip file with all the data, including the images at different resolutions. The link and checksum (to verify integrity) are available on the page of each dataset (click on the dataset name in the table above).

If you use Python, you can directly import a dataset into your code through the causalchamber package. You can install it using pip, e.g. by typing

pip install causalchamber

in an appropriate shell. Datasets can then be accessed directly from your Python code. For example, you can access the light-intensity data for the symbolic regression case study (Fig. 6e) as follows:

from causalchamber.datasets import Dataset

# Download the dataset and store it, e.g., in the current directory
dataset = Dataset(name='lt_malus_v1', root='./', download=True)

# Select an experiment and load its observations
experiment = dataset.get_experiment(name='white_255')
df = experiment.as_pandas_dataframe()

For the available experiment names, see the page for each dataset (click on the dataset name in the table above) or run

dataset.available_experiments()

# Output:
# ['blue_128',
#  'blue_255',
#  'blue_64',
#  'green_128',
#  'green_255',
#  'green_64',
#  'red_128',
#  'red_255',
#  'red_64',
#  'white_128',
#  'white_255',
#  'white_64']

Mechanistic models

The causalchamber package also contains Python implementations of the mechanistic models described in appendix IV of the original paper. The models follow the same nomenclature as in the paper, e.g., to import and run model A1 of the steady-state fan speed:

import numpy as np
from causalchamber.models import model_a1
model_a1(L=np.linspace(0,1,10), L_min=0.1, omega_max=314.15)

# Output:

# array([ 31.415     ,  34.90555556,  69.81111111, 104.71666667,
#        139.62222222, 174.52777778, 209.43333333, 244.33888889,
#        279.24444444, 314.15      ])

The implementations can be found in the src/causalchamber/models directory. You can find examples of using the models in the case_studies/mechanistic_models.ipynb notebook in the separate paper repository.

Causal ground-truth graphs

The graphs for the causal ground truths given in Fig. 3 of the original paper can be found as adjacency matrices in the ground_truths/ directory. The adjacencies can also be loaded through the causalchamber package, e.g.,

from causalchamber.ground_truth import graph
graph(chamber="lt", configuration="standard")

# Output:

#              red  green  blue  osr_c  v_c  current  pol_1  pol_2  osr_angle_1  \
# red            0      0     0      0    0        1      0      0            0   
# green          0      0     0      0    0        1      0      0            0   
# blue           0      0     0      0    0        1      0      0            0   
# osr_c          0      0     0      0    0        1      0      0            0   

To make it easier to plot graphs and reference them back to the original paper, the latex representation of each variable can be obtained by calling the latex_name function. For example, to obtain the latex representation $\theta_1$ of the pol_1 variable, you can run

from causalchamber.ground_truth import latex_name
latex_name('pol_1', enclose=True)

# Output:

# '$\\theta_1$'

Setting enclose=False will return the name without surrounding $.

Building the chambers

You can find the resources to build the chambers in hardware/, together with the datasheets for all physical components (see appendix VI of the original paper).

Licenses

All images and .csv files in the datasets are licensed under a CC BY 4.0 license. A copy of the license can be found in LICENSE_DATASETS.txt.

The code, e.g., for the causalchamber package and mechanistic models, is shared under the MIT license. A copy of the license can also be found in LICENSE_SOFTWARE.txt.