Failing to generate H1 result with small dataset.

Question

Failing to generate H1 result with small dataset.

jamesdhope opened this issue a year ago · 1 comments

emb_df_pca_transformed_df = pd.DataFrame([[-0.00757914, -0.0253189],
 [-0.00844645, -0.00540171],
 [ 0.00108799,  0.0065374 ],
 [-0.00387104, -0.01243067],
 [-0.00870402, -0.00650156],
 [ 0.01830302,  0.02870784],
 [-0.02218258,  0.01270735],
 [ 0.035802, -0.00130045],
 [ 0.00926424, -0.02861457],
 [-0.03469143,  0.01697614],
 [ 0.02101742,  0.01463914]])

diagrams = ripser(emb_df_pca_transformed_df)['dgms']

print(diagrams)

Result:

[array([[0.        , 0.00112961],
       [0.        , 0.00764932],
       [0.        , 0.01321718],
       [0.        , 0.01341106],
       [0.        , 0.01432816],
       [0.        , 0.015279  ],
       [0.        , 0.01716278],
       [0.        , 0.02151326],
       [0.        , 0.02174062],
       [0.        , 0.02272926],
       [0.        ,        inf]]), array([], shape=(0, 2), dtype=float64)]

Note that diagrams[1] is empty.

I have attempted to set threshold values; this does not affect the H1 output.

How can I compute H1 for a small tightly clustered dataset please?

Answer 1 · 2023-08-17T20:06:17.000Z

Hello,
Thank you for your interest in ripser.py! Actually, this result should be correct. Looking at the point cloud:

there is no H1 in it. In other words, no loops exist that persist in the dataset. If, in this example, I add the point [0.03, -0.027] to the dataset you have

Then you will get a loop that persists for a short while, because it actually completes a loop
array([[0.02634635, 0.03808299]])]

Hope that helps.
Best,
Chris