lmcinnes/enstop

HDBSCAN error stopping EnsembleTopics

Opened this issue · 0 comments

The code from your homepage

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from enstop import EnsembleTopics

news = fetch_20newsgroups(subset='all')
data = CountVectorizer().fit_transform(news.data)

model = EnsembleTopics(n_components=20).fit(data)
topics = model.components_
doc_vectors = model.embedding_

results in an error:
File hdbscan\_hdbscan_tree.pyx:659, in hdbscan._hdbscan_tree.get_clusters()

File hdbscan\_hdbscan_tree.pyx:733, in hdbscan._hdbscan_tree.get_clusters()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

I have sklearn 1.3.0, Python 3.11.4