recovering structures within markov equivalent class
lihebi opened this issue · 3 comments
Different causal structures can encode the same conditional independence. These structures are known to belong to the same Markov equivalent class, and generally cannot be distinguished without interventional data. In the simplest case for example, it is impossible to distinguish A->B or B->A from purely observational data.
LiNGRAM and related algorithms used non-Gaussian noise or non-linear model to break the symmetric, and thus can further distinguish the directions. However, it is known that linear causal model with Gaussian noise is still impossible to detect.
But it looks like NOTEARS is able to recover the right direction. I tried running NOTEARS on 2-node 1-edge graphs, with linear SEM model and Gaussian noise, for 1000 runs, and only 10 failed to recover the exact structure. That is far from random. I consider this as very important result, but I didn't see any discussion about markov equivalent class in the NOTEARS paper. Do you have any insight how does this possibly work?
Hi, sorry for the late reply!
Yes, your observation is correct. There are two aspects I would like to point out.
-
It is known that linear Gaussian SEM with equal error variance is identifiable (Peters and Bühlmann, 2014). The simulation in this codebase corresponds to this case, therefore the result shouldn't be surprising.
-
What happens if the error variance is not equal? Loh and Bühlmann (2014) discussed some cases when the error variance is not specified correctly in weighted least squares. Since we're using least squares score (not weighted), it also falls into this category. For two variables, we can actually calculate the condition in closed form: the global minimum of least squares score is the true graph when 1 - beta^2 < var2 / var1, where the true graph is x1 = eps1, x2 = beta * x1 + eps2, eps1 and eps2 have variance var1 and var2. If this condition is satisfied, you can find NOTEARS achieving good performance. (There's certainly another gap between global vs local optimum from nonconvexity, so this statement is empirical.) In contrast, if this condition is violated, you can find NOTEARS getting SHD near 0.5, i.e. random guessing. I've uploaded a script that verifies this.
Thanks for sharing your insights! That makes a lot of sense. I guess I'm closing this issue now.
Hi, sorry for the late reply!
Yes, your observation is correct. There are two aspects I would like to point out.
- It is known that linear Gaussian SEM with equal error variance is identifiable (Peters and Bühlmann, 2014). The simulation in this codebase corresponds to this case, therefore the result shouldn't be surprising.
- What happens if the error variance is not equal? Loh and Bühlmann (2014) discussed some cases when the error variance is not specified correctly in weighted least squares. Since we're using least squares score (not weighted), it also falls into this category. For two variables, we can actually calculate the condition in closed form: the global minimum of least squares score is the true graph when 1 - beta^2 < var2 / var1, where the true graph is x1 = eps1, x2 = beta * x1 + eps2, eps1 and eps2 have variance var1 and var2. If this condition is satisfied, you can find NOTEARS achieving good performance. (There's certainly another gap between global vs local optimum from nonconvexity, so this statement is empirical.) In contrast, if this condition is violated, you can find NOTEARS getting SHD near 0.5, i.e. random guessing. I've uploaded a script that verifies this.
For the second point, the condition should be (1-beta^2) beta^2 < var2 / var1 ? (for the case without l1 reg)