py-why/pywhy-graphs

Checking the validity of a constructed PAG

adam2392 opened this issue · 7 comments

Is your feature request related to a problem? Please describe.
Whenever a PAG is constructed, it is not necessary a valid PAG in the sense of the definition. I.e. it might be a PAG with additional orientations.

However, for many use-cases, one might want to construct only a valid PAG. So they would like to check its validity in a graphical algorithm.

Describe the solution you'd like
As suggested by @jdramsey in https://reader.elsevier.com/reader/sd/pii/S0004370208001008?token=1C0B24A74D739F3C8D05A0D2DEC40AA540835B1E58C74D189AE09D9C6FABBF30C941385A4A1F9FC9DFAE1E316E022D1E&originRegion=us-east-1&originCreation=20230320132639, Theorem 2 provides a method for turning a PAG into a MAG. This was originally suggested by Peter Spirtes.

Imo, this is an extremely valuable algorithm in terms of applications.

Then one can check the validity of the MAG using ancestrality and maximality. If there is a violation of the definition of the MAG, then the PAG is essentially invalid.

This can be implemented as a graph algorithm.

Additional context
xref: cmu-phil/tetrad#1556

Hey @adam2392 for my next issue, do you think this will be a good one? Without going into depth, it looks a bit complicated for someone (me) who has never made contributions to a graph library. Any suggestions?

Design-wise, skim a look at networkx documentation/code and how it structures the graph classes vs functions. We are implementing a function here.

The function should:

  • check for some validity that it is a PAG (e.g. circle edges)
  • convert the PAG to a MAG using the theorem linked in the issue in Zhang's 2008 paper (we don't have a MAG class, but we can just use an ADMG class for now)
  • check the validity of the MAG
  • convert the MAG back to a PAG
  • check if PAG is same as the PAG fed in

Yeah this one is quite involved, but the code/explicit theorems are all out there. Just need to put it together.

Alright, so I went through the documentation/code of networkx and pywhy-graphs. I was able to understand the code structure and most of the classes/functions in the repo. I went through the paper and the theorem you are referencing and it seems simple enough for me to implement.
I am having difficulty understanding the validity part, can you elaborate on how one would check the validity of a PAG (for the purpose of this function, wouldn't simply asserting that the graph object provided is an instance of the PAG class be enough?) and a MAG?
I am new to causal learning and have been trying to catch up with the terminology and the symbols that are used in the papers and the code. I have made good progress in understanding things related to this issue. Hopefully with your help I can make good progress!

I am having difficulty understanding the validity part, can you elaborate on how one would check the validity of a PAG (for the purpose of this function, wouldn't simply asserting that the graph object provided is an instance of the PAG class be enough?) and a MAG?

The current class does not guarantee the underlying class is valid; it's just named that way. Because of this and your valid confusion, honestly we might just overhaul the naming of these classes. This is an ongoing discussion in #65 such that the class names are refactored to be "generic" graphs that have these different kind of edges, but are not guaranteed to be any PAG, MAG, CPDAG, etc. We might then implement functions instead that check validity. A user can then construct graphs and then check validity, which of course can become expensive if it's checked many times. Or if they are adventurous and confident, they don't need to check it.

But this implementation would still be valid. Because you just feed in a graph with a mixture of edge types, and the goal is to return valid, or not valid PAG.

So basically checking validity follows the pipeline I outlined where the graph with mix of directed, bidirected, circle and undirected edges are then checked against the reconstructed PAG. If the two graphs don't match, or there are some other discrepancies in the middle of the pipeline (e.g. MAG properties violated), then the graph is not a valid PAG.

If this is still unclear, I would check out the Tetrad application and draw a few "PAGs" and then run the check in tetrad.

@adam2392 I just reviewed the PAG code and there is a function is_valid_mec_graph that appears to enforce the validity check in the PAG class. Is that understanding correct?

@adam2392 I just reviewed the PAG code and there is a function is_valid_mec_graph that appears to enforce the validity check in the PAG class. Is that understanding correct?

I believe this only enforces the validity when adding edges. Probably poorly named. I think we can refactor these names once we decide on a proper class for:

  • mixed-edge graphs that support Chain Graphs, CPDAG, PDAGs and their extensions
  • mixed-edge graphs that support ADMG/DAGs and their extensions
  • mixed-edge graphs that support PAGs and their extensions

What are some good generic names for the directed/undirected graph and the directed/bidirected graph and the directed/bidirected/circle/undirected graph?

I will reply in #65 to avoid cluttering this issue