sissa-data-science/DADApy

compute_clustering_pure_python crashed

Closed this issue · 3 comments

Command:
data.compute_clustering_pure_python(Z=1.65)

Output:

Clustering started
Number of clusters before multimodality test= 45
1.97 seconds clustering before multimodality test
0.07 seconds identifying the borders
0.15 seconds with multimodality test


IndexError Traceback (most recent call last)
in
----> 1 data.compute_clustering_pure_python(Z=1.65)

~/Code/DULY/duly/clustering.py in compute_clustering_pure_python(self, Z, halo)

    346                         kk = nnum[k]
    347                         log_den_bord_m[jj][kk] = log_den_bord[j][k]
--> 348                         log_den_bord_err[jj][kk] = log_den_bord_err[j][k]
    349         Last_cls = np.empty(N, dtype=int)
    350         for j in range(N_clusters):

IndexError: index 25 is out of bounds for axis 0 with size 2

I found the bug. Although I have not pushed it yet, since I still have to pull the latest modifications and DADpy and all that. The bug was at line 341:

should have been
log_den_bord_err_m = np.zeros((N_clusters, N_clusters), dtype=float)

instead of
log_den_bord_err = np.zeros((N_clusters, N_clusters), dtype=float)

NOW compute_clustering_pure_python() gives the same results as Fortran, so for me it is the stable version! While compute_clustering() still crashes on some databases. I suspect there is a +1 error somewhere in the cython code coming from an error in the translation the different convention between Python counting in array indices (from 0) and Fortran (from 1). But I don't know where, since cython debugging takes long

Issue addressed in last commits by @alexdepremia before release. Consider closing it

Thanks @imacocco I think this was addressed in the last months, I will close this