Failing to generate H1 result with small dataset.
jamesdhope opened this issue · 1 comments
jamesdhope commented
Failing to generate H1 result with small dataset.
emb_df_pca_transformed_df = pd.DataFrame([[-0.00757914, -0.0253189],
[-0.00844645, -0.00540171],
[ 0.00108799, 0.0065374 ],
[-0.00387104, -0.01243067],
[-0.00870402, -0.00650156],
[ 0.01830302, 0.02870784],
[-0.02218258, 0.01270735],
[ 0.035802, -0.00130045],
[ 0.00926424, -0.02861457],
[-0.03469143, 0.01697614],
[ 0.02101742, 0.01463914]])
diagrams = ripser(emb_df_pca_transformed_df)['dgms']
print(diagrams)
Result:
[array([[0. , 0.00112961],
[0. , 0.00764932],
[0. , 0.01321718],
[0. , 0.01341106],
[0. , 0.01432816],
[0. , 0.015279 ],
[0. , 0.01716278],
[0. , 0.02151326],
[0. , 0.02174062],
[0. , 0.02272926],
[0. , inf]]), array([], shape=(0, 2), dtype=float64)]
Note that diagrams[1] is empty.
I have attempted to set threshold values; this does not affect the H1 output.
How can I compute H1 for a small tightly clustered dataset please?
ctralie commented
Hello,
Thank you for your interest in ripser.py! Actually, this result should be correct. Looking at the point cloud:
there is no H1 in it. In other words, no loops exist that persist in the dataset. If, in this example, I add the point [0.03, -0.027] to the dataset you have
Then you will get a loop that persists for a short while, because it actually completes a loop
array([[0.02634635, 0.03808299]])]
Hope that helps.
Best,
Chris