ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float' - validity_index

Question

ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float' - validity_index

Faisal-AlDhuwayhi opened this issue a year ago · 4 comments

I'm using the validity index in the package, which implements DBCV score according to the following paper:
https://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf

I'm working on a face clustering project, and after using the validity index it prompts an error, here is the code:

dbcv_score_output = hdbscan.validity.validity_index(feature_vectors, archive_labels)
dbcv_score_output

The full error:

hdbscan/validity.py:30: RuntimeWarning: overflow encountered in power
  distance_matrix[distance_matrix != 0] = (1.0 / distance_matrix[

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:371, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
    356         continue
    358     distances_for_mst, core_distances[
    359         cluster_id] = distances_between_points(
    360         X,
   (...)
    367         **kwd_args
    368     )
    370     mst_nodes[cluster_id], mst_edges[cluster_id] = \
--> 371         internal_minimum_spanning_tree(distances_for_mst)
    372     density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
    374 for i in range(max_cluster_id):

File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:165, in internal_minimum_spanning_tree(mr_distances)
    136 def internal_minimum_spanning_tree(mr_distances):
    137     """
    138     Compute the 'internal' minimum spanning tree given a matrix of mutual
    139     reachability distances. Given a minimum spanning tree the 'internal'
   (...)
...
    167     for index, row in enumerate(min_span_tree[1:], 1):

File hdbscan/_hdbscan_linkage.pyx:15, in hdbscan._hdbscan_linkage.mst_linkage_core()

ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float'

A quick look at the inputs and its types:

The features:

dtype=float32
shape: (70201, 320)

The archives/clusters (it is label encoded):
shape: (70201,)

When I tried to change the features type to double/float64, it showed a different kind of error:

hdbscan/validity.py:33: RuntimeWarning: invalid value encountered in true_divide
  result /= distance_matrix.shape[0] - 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:372, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
    358     distances_for_mst, core_distances[
    359         cluster_id] = distances_between_points(
    360         X,
   (...)
    367         **kwd_args
    368     )
    370     mst_nodes[cluster_id], mst_edges[cluster_id] = \
    371         internal_minimum_spanning_tree(distances_for_mst)
--> 372     density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
    374 for i in range(max_cluster_id):
    376     if np.sum(labels == i) == 0:

File ~/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:40, in _amax(a, axis, out, keepdims, initial, where)
     38 def _amax(a, axis=None, out=None, keepdims=False,
     39           initial=_NoValue, where=True):
---> 40     return umr_maximum(a, axis, None, out, keepdims, initial, where)

ValueError: zero-size array to reduction operation maximum which has no identity

I went through all the related issues and fixes in the repo but with no avail. Are there any recommendations or fixes?

Thanks in advanced!

Faisal-AlDhuwayhi commented a year ago

@lmcinnes

Answer 1 · 2024-01-22T16:38:57.000Z

I had this same issue and managed to fix through casting my X array to float64. Reading your error message it seems your input array is float32, so you may have the same problem I did. I stumbled on this possible fix whilst reading issue #71. Try the following:

import numpy as np
from hdbscan import validity_index

feature_vectors = feature_vectors.astype(np.float64)
dbcv_score_output = validity_index(X=feature_vectors, labels=archive_labels)

Answer 2 · 2024-03-31T14:08:11.000Z

thanks for the help @mhaythornthwaite , but if I convert feature_vectors to double or float64, it would show this error as specified above in the question:

ValueError: zero-size array to reduction operation maximum which has no identity

Answer 3 · 2024-04-16T15:53:40.000Z

Is it a supposed to only work on 64bit? cant it work on 16fit fp?