y0-causal-inference/y0

Issue with the estimand generated

srtaheri opened this issue · 4 comments

The estimand formula for a simple graphical example fails. Here is the code:

from y0.dsl import X, Y
from y0.graph import NxMixedGraph
from y0.algorithm.identify import identify, Identification

graph = NxMixedGraph.from_str_edges(
    directed=[
        ("Z", "X"),
        ("Z", "Y"),
        ("X", "M"),
        ("M", "Y"),
    ],
)

graph.draw()

estimand = identify(
    Identification.from_parts(
        graph=graph, outcomes={Y}, treatments={X}
    )
)
estimand

Here is the output:

$$\sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) \sum \sum_{M,X,Y} P(M,X,Y,Z)$$

The issues are,

  1. The value of Y should not be summed over.

  2. The estimand should either use the back-door estimand which does not contain $M$ such as this:

$$\sum_{Z} P(Y|X,Z) P(Z)$$

or the front-door estimand which does not use $Z$. Such as this:

$$\sum_{M} P(M|X) \sum_{X'} P(Y|X', M) P(X')$$

I think it can't contain both Z and M simultaneously in the same formula.

cthoyt commented

@srtaheri I wonder if you are using an old version of y0, we fixed this empty sum problem in #159. My results are:

$\sum_{M, Z} P(Y | M, X, Z) \sum_{M, X, Y} P(M, X, Y, Z) P(M | X, Z)$

I wonder if there are parts of the ID algorithm where we can do some bookkeeping to eliminate intermediate variables. Is it possible to show through symbolic manipulation starting with the equation that I just wrote that it is the same as the one you propose?

@cthoyt A modified version of the formula that you provided is correct:

$$ \sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) \sum_{M,X,Y} P(Z,M,X,Y) $$

Which is equal to:

$$ \sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) P(Z) = \sum_{Z} P(Y|X,Z) P(Z) $$

The last expression is the back-door estimand.

When M is summed out, we shouldn't put it in the formula and provide a more complex estimand, when in reality the value of M is not important and does not show in the final estimand. I suggest to print out the final, simplified estimand

cthoyt commented

Can you clarify on the rules that you used to collapse that sum down?

Adding canonicalize takes care of the first sum simplification, but not the second. See:

from y0.graph import NxMixedGraph
from y0.algorithm.identify import identify, Identification
from y0.dsl import X, Y
from y0.mutate.canonicalize_expr import canonicalize


graph = NxMixedGraph.from_str_edges(
    directed=[
        ("Z", "X"),
        ("Z", "Y"),
        ("X", "M"),
        ("M", "Y"),
    ],
)

estimand = identify(
    Identification.from_parts(
        graph=graph, outcomes={Y}, treatments={X}
    )
)
canonicalize(estimand)

Gives $\sum\limits_{M, Z} P(M | X, Z) P(Y | M, X, Z) P(Z)$

So I think we can compute it like this:

$\sum\limits_{M, Z} P(M | X, Z) P(Y | M, X, Z) P(Z)$

$= \sum_{Z} P(Z) \sum_{M} P(M|X,Z) P(Y|M,X,Z)$

$= \sum_{Z} P(Z) \sum_{M} \frac{P(M,X,Z)}{P(X,Z)} \frac{P(Y,M,X,Z)}{P(M,X,Z)}$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} \sum_{M} P(Y,M,X,Z)$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} P(Y,X,Z)$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} P(Y|X,Z) P(X,Z)$

$= \sum_{Z} P(Z) P(Y|X,Z)$