Project for Math 522 at BYU. We are interested in understanding causal inference in deep learning.
This field essentially exists because of the work of Judea Pearl. Pearl's main idea is that of Pearl's causality hierarchy. This hierarchy is as follows:
- Association
- Seeing. (What if I see...?)
- Intervention
- Doing. (What if I do...? How?)
- Counterfactuals
- Imagining. (What if I had done...?)
Talks about how neural networks and structural causal models are connected. Currently, the only way to identify causal effects is through
One key finding of this paper is given a bunch of data that only represents observational data, a neural network is unable to identify causal relationships or effects of intervention. The neural causal model (ncm) is proposed. NCM is a structural causal model (scm) that is based on neural nets and can be learned using gradient descent. Uses feed-forward neural networks. The idea is to approximate the connections between variables by performing the other aspects of Pearl's causality heirarchy.
The basic idea is to introduce the concept of interventions in GNNs to jointly learn embeddings and causal effects. This is implemented through so-called intervential GNN layers.
A nice introductory tutorial on Pytorch Geometric. Pytorch Geometric is a library used for graph neural networks.
Maybe we can find a way to take a super basic dataset and implement a GNN on it. Then we can remove connections to nodes and see how well/poor the prediction is.
This is a synthetically generated dataset by causal Bayesian networks with binary variables. It has 11 different variables and a ton of samples. The variables interact with each other in a DAG format which is perfect for using SCMs and GNNs. Maybe we could take the data, format it in a way the PyTorch Geometric likes it, and then run a GNN on it, removing different connections? Maybe that can simulate a
The do operator is a way to represent interventions in a causal model. It is a way to represent the effect of an intervention on a variable. As an example, consider the following model involving smoking.
If a person's fingernails
So, in terms of
With this in mind, we now define the
In a cuasal diagram
where
It is important to note that the above equation is how we calculate the probability of several events happening given one event has happened. What if we want to get the probability of a single event happening, given we do a single event? That leads to the following corollary.
If
where the sum runs over all values
$$\begin{equation} P(y;|; \textbf{do}(x)) = \frac{P(x,,y)}{P(x)} = P(y;|; x).
\end{equation}$$