giotto-ai/giotto-ph

[BUG] Dead comment in compute_dim_0_pairs/incorrect removal of edges with zero filtration value

ulupo opened this issue · 4 comments

ulupo commented

Am i right to think that this comment

giotto-ph/gph/src/ripser.h

Lines 936 to 937 in c7f5d46

// TODO: Get correct birth times if the edges are negative (required for
// lower star)

is referring to an issue that is now resolved?

ulupo commented

Actually I now worry that this is still unaddressed, because of

if (get_diameter(e) != 0) {

ulupo commented

Continuing from #29 (comment)

About the check if (get_diameter(e) != 0) {, I am not sure I understand the issue.
It just ensure that the diameter of the edge is not 0.

There is an important conceptual difference between ripser.py and original ripser, and it is that original ripser assumes all inputs are matrices with zero along the diagonal and non-negative off-diagonal entries. This corresponds to assuming all inputs are distance matrices, since there cannot be negative distances and the distance of a point from itself is zero for any reasonable notion of distance. Original ripser in dimension 0 allows for the possibility that some off-diagonal distances are zero, but if that is the case it does not create a bar (because it would be a trivial bar which is born at 0 and dies at 0). This is enforced by the if (get_diameter(e) != 0) {, check.

But this is conceptually incorrect for us and for ripser.py. Our assumptions for the input are just that entry (i, j) is greater than or equal to both entry (i, i) and entry (j, j). There should be nothing special about zero off-diagonal entries and we should even allow for negative entries (on or off-diagonal). I think we already do this correctly in higher dimensions, but it seems to me that there might still be issues in dimension 0.

There is a unit test that really should be added to this repository and it is the following:

  1. Create a dense distance matrix dm by running squareform(pdist(X)) where X is a point cloud. Compute its diagrams dgms (up to dimension 1 or 2).
  2. Create dm_neg_vert = dm - 1. Compute its diagrams dgms_neg_vert. Check that dgms_neg_vert[d] = dgms[d] - 1 for all dimensions d.
  3. Check that dm[0, 1] != 0 and create dm_one_zero_edge = dm - dm[0, 1]. Compute its diagrams dgms_one_zero_edge. Check that dgms_one_zero_edge [d] = dgms_one_zero_edge[d] - dm[0, 1] for all dimensions d.

I suspect that we currently fail test 3 in dimension 0.

ulupo commented

@MonkeyBreaker here's a complete test demonstrating that there is a bug:

import numpy as np
from scipy.spatial.distance import pdist, squareform
from gph import ripser_parallel

np.random.seed(0)

maxdim = 0
kwargs = {"metric": "precomputed", "maxdim": maxdim}

shape = (5, 4)

X = np.random.random(shape)
dm = squareform(pdist(X))

dgms_orig = ripser_parallel(dm, **kwargs)["dgms"]

offset = 1
dm_neg_vert = dm - offset
dgms_neg_vert = ripser_parallel(dm_neg_vert, **kwargs)["dgms"]

edge_val = dm[0, 1]
assert edge_val

dm_one_zero_edge = dm - edge_val

dgms_one_zero_edge = ripser_parallel(dm_one_zero_edge, **kwargs)["dgms"]

for dim in range(maxdim + 1):
    print(f"""# DIMENSION {dim}

Original dm:
{dgms_orig[dim]}
Subtract {offset} from dm, and add {offset} back to the diagram:
{dgms_neg_vert[dim] + np.float32(offset)}
Subtract dm[0, 1] from dm (make one edge zero), and add dm[0, 1] back to the diagram:
{dgms_one_zero_edge[dim] + np.float32(edge_val)}
    
    """)

All diagrams should be identical, but the last one has one fewer bar (!) because we removed an edge just because it was zero. Furthermore, it has incorrect birth times. (see edit below)

I'm thinking we have TWO issues here: the TODO comment is not dead (there is an issue with birth values clearly), AND we should not remove edges just because they are zero. The second issue is solved in this example by removing the if (get_diameter(e) != 0) check as I mentioned above, but not the first! (see edit below)

ulupo commented

EDIT TO PREVIOUS MESSAGE: It turns out the wrong birth times were just about float64 to float32 conversions, the test works now that I've added float32 casting explicitly. This reminds me that we really should address #19!