scikit-tda/ripser.py

Get the indices of the vectors forming a hole

AlkanGoktug opened this issue · 2 comments

Hello,

I am using ripser to analyze holes in my dataset. For this, I use the persistence homology with Betti number 1.

I would like to know if there is a way to access the indices of the vectors from the dataset which form the holes. Unfortunately, I can just access the birth and death values of the holes. I would like to access also the indices of the vectors that form each hole.

Thanks in advance.

Yes, right now it seems Ripser can only compute the persistence diagrams but you have no way of analyzing the the individual data points that compose the various topological components found. If my PD shows my data has two cycles, then I want to know which points are on cycle 1 and which points are on cycle 2, for example.

There are ways to determine which points might lie near the hole which the persistence diagram is detecting. Please see this page in the docs for an example of how to use ripser.py to find representative co-cycles.

However, translating from a (co)homological feature, like a point in a persitence diagram, to a feature within the actual data, like a (co)cycle representative or list of the data points which "gave rise" to that feature, can be very subtle. Homology and cohomology are equivalence relations and what you both are asking for are representatives within an equivalence class. A priori, no one representative is better than any other. This is analogous to asking for a representative of the class of 1 within the integers modulo 5. You could say 1 should be the representative, but 6, 11, 16, and -4 are all just as fair answers. To make the topological situation even more difficult, the cycle representative need not be stable. This means that if your data is perturbed slightly, there are theoretical guarantees that the points within the persistence diagram will only move slightly, but no guarantee that your cycle representative will only move slightly--in fact it can move arbitrarily far away from the initial representative.

The domain-specific/scientific aspect of the problem you're solving might lead you to natural choices for which representatives to choose (e.g., minimal length, minimal energy, minimal cost, etc.)