ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float' - validity_index
Faisal-AlDhuwayhi opened this issue · 4 comments
I'm using the validity index in the package, which implements DBCV score according to the following paper:
https://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf
I'm working on a face clustering project, and after using the validity index it prompts an error, here is the code:
dbcv_score_output = hdbscan.validity.validity_index(feature_vectors, archive_labels)
dbcv_score_output
The full error:
hdbscan/validity.py:30: RuntimeWarning: overflow encountered in power
distance_matrix[distance_matrix != 0] = (1.0 / distance_matrix[
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:371, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
356 continue
358 distances_for_mst, core_distances[
359 cluster_id] = distances_between_points(
360 X,
(...)
367 **kwd_args
368 )
370 mst_nodes[cluster_id], mst_edges[cluster_id] = \
--> 371 internal_minimum_spanning_tree(distances_for_mst)
372 density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
374 for i in range(max_cluster_id):
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:165, in internal_minimum_spanning_tree(mr_distances)
136 def internal_minimum_spanning_tree(mr_distances):
137 """
138 Compute the 'internal' minimum spanning tree given a matrix of mutual
139 reachability distances. Given a minimum spanning tree the 'internal'
(...)
...
167 for index, row in enumerate(min_span_tree[1:], 1):
File hdbscan/_hdbscan_linkage.pyx:15, in hdbscan._hdbscan_linkage.mst_linkage_core()
ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float'
A quick look at the inputs and its types:
- The features:
dtype=float32
shape: (70201, 320)
- The archives/clusters (it is label encoded):
shape: (70201,)
When I tried to change the features type to double/float64, it showed a different kind of error:
hdbscan/validity.py:33: RuntimeWarning: invalid value encountered in true_divide
result /= distance_matrix.shape[0] - 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:372, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
358 distances_for_mst, core_distances[
359 cluster_id] = distances_between_points(
360 X,
(...)
367 **kwd_args
368 )
370 mst_nodes[cluster_id], mst_edges[cluster_id] = \
371 internal_minimum_spanning_tree(distances_for_mst)
--> 372 density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
374 for i in range(max_cluster_id):
376 if np.sum(labels == i) == 0:
File ~/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:40, in _amax(a, axis, out, keepdims, initial, where)
38 def _amax(a, axis=None, out=None, keepdims=False,
39 initial=_NoValue, where=True):
---> 40 return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity
I went through all the related issues and fixes in the repo but with no avail. Are there any recommendations or fixes?
Thanks in advanced!
I had this same issue and managed to fix through casting my X array to float64. Reading your error message it seems your input array is float32, so you may have the same problem I did. I stumbled on this possible fix whilst reading issue #71. Try the following:
import numpy as np
from hdbscan import validity_index
feature_vectors = feature_vectors.astype(np.float64)
dbcv_score_output = validity_index(X=feature_vectors, labels=archive_labels)
thanks for the help @mhaythornthwaite , but if I convert feature_vectors to double or float64, it would show this error as specified above in the question:
ValueError: zero-size array to reduction operation maximum which has no identity
Is it a supposed to only work on 64bit? cant it work on 16fit fp?