Welcome to the GitHub repository for the Causal Analysis Workshop at the Women in Data Science (WiDS) conference, May 2024, Puget Sound (https://www.widspugetsound.org/2024-schedule).
This workshop is designed to introduce the basic concepts and methods of causal discovery and causal inference.
Causal analysis is a powerful tool for understanding the mechanisms and effects of interventions in complex systems. While A/B experimentation is the gold standard for extracting causal insights, there are situations where experimenting isn’t possible – such as when the feature was already released or ethical restrictions prevent us from experimenting on certain populations.
In such cases, we must rely on observational data, which pose many challenges for causal analysis, such as confounding, selection bias, and unmeasured variables.
In this workshop, we will guide the audience through a hands-on, step-by-step causal analysis using common Python causal libraries, including DECI and EconML.
- An internet-connected device
- Basic knowledge of Python
- Use
git clone https://github.com/GangaMegha/CausalAnalysisTutorial.git
to clone this repository on to your system. - Navigate to the directory using
cd CausalAnalysisTutorial
. - Setup your environment and install the libraries ( Go to Environment-Setup )
- Launch Jupyter Notebook using
jupyter notebook
. - Click and open the notebook WiDsCausalAnalysisWorkshop.ipynb.
- You're all set to go!
You will need to install the necessary python packages to do causal analysis using DECI and DML in the hands-on workshop.
It is recommended that you create and work within a virtual environment to avoid package conflicts with other libraries you have installed on your machine.
[!IMPORTANT] Please use Python 3.10 for this tutorial.
Install the virtual environment package if it’s not already installed using pip install virtualenv
Create the environment:
Try:
python3.10 -m venv env_causal
If the above does not work for you, find the path to the Python interpreter of the version you want to use with where python
and create environment using
virtualenv -p path_to_python env_causal
Activate your environment:
- Windows:
env_causal\Scripts\activate
- Unix or MacOS:
source env_causal/bin/activate
Use if you already have anaconda installed on your machine.
conda create --name env_causal python=3.10 -y
Activate your environment:conda activate env_causal
# Run this command line from the repo root folder
# Then open http://localhost:8888/lab from your browser in the host
docker run --rm -p 8888:8888 -v .:/home/jovyan jupyter/scipy-notebook:python-3.10
You can install all the packages in a Python 3.10 environment using
pip install -r requirements.txt
Or install the dependencies from the notebook using
!pip install -r requirements.txt
The above will install all necessary packages for the workshop on your system with the correct versions. This will help avoid any version conflicts.
-OR-
If you already have a python environment created to run jupyter notebooks, you can also install the below packages individually:
- DECI [https://github.com/microsoft/causica] :
pip install causica
- DML [https://econml.azurewebsites.net/spec/estimation/dml.html]:
pip install econml
We will use the same toy dataset used in DECI and EconML (Multi-investment Attribution) for illustration. The dataset and related files can be found in the data
directory of this repository.