how to use Streamkm++ for text clustring
miladfa7 opened this issue · 0 comments
miladfa7 commented
i run streamSKM++ using clusopt
code:
from clusopt_core.cluster import Streamkm
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt
k = 32
dataset, _ = make_blobs(n_samples=64000, centers=k,
random_state=100, cluster_std=0.1)
batchs = np.split(dataset, len(dataset) / 4000)
model = Streamkm(coresetsize= k *10, length=64000, seed=1)
for batch in batchs:
model.partial_fit(batch)
clusters, _ = model.get_final_clusters(k, seed=42)
plt.scatter(*dataset.T, marker=",", label="datapoints")
plt.scatter(*model.get_streaming_coreset_centers().T, marker=".", label="microclusters")
plt.scatter(*clusters.T, marker="x", label="macro clusters", color="black")
plt.legend()
plt.show()
how to use streamSKM++ for text clustering. please help me