katoss/cardsort

create_dendrogram raises 'ValueError: Distance matrix 'X' diagonal must be zero.'

ladyofthelog opened this issue · 1 comments

Analyzing data from kardsort on Python 3.9 - notebook & file are in this repo.

create_dendrogram raises this error.

<ipython-input-8-d0c5110595a7> in <module>
----> 1 cardsort.create_dendrogram(df)

/usr/local/lib/python3.9/site-packages/cardsort/analysis.py in create_dendrogram(df, distance_matrix, count, linkage, color_threshold)
    187     else:
    188         if distance_matrix is None:
--> 189             distance_matrix = get_distance_matrix(df)
    190 
    191         count_types = ["absolute", "fraction"]

/usr/local/lib/python3.9/site-packages/cardsort/analysis.py in get_distance_matrix(df)
    127             else:
    128                 distance_matrix_all = np.add(distance_matrix_all, distance_matrix_user)
--> 129         condensed_distance_matrix = squareform(distance_matrix_all)
    130         return condensed_distance_matrix
    131 

/usr/local/lib/python3.8/site-packages/scipy/spatial/distance.py in squareform(X, force, checks)
   2182             raise ValueError('The matrix argument must be square.')
   2183         if checks:
-> 2184             is_valid_dm(X, throw=True, name='X')
   2185 
   2186         # One-side of the dimensions is set here.

/usr/local/lib/python3.8/site-packages/scipy/spatial/distance.py in is_valid_dm(D, tol, throw, name, warning)
   2263             if not (D[range(0, s[0]), range(0, s[0])] == 0).all():
   2264                 if name:
-> 2265                     raise ValueError(('Distance matrix \'%s\' diagonal must '
   2266                                       'be zero.') % name)
   2267                 else:

ValueError: Distance matrix 'X' diagonal must be zero.
katoss commented

Hi @ladyofthelog,

glad you are using cardsort :) I tried to reproduce the error, and there seems to be an issue with your data. The columns all need to be of the same length, but your column "category_labels" is missing a value. Replacing the value "N/A" in row 775 by a string value solved the problem for me. Let me know if that works for you!