giuliano-macedo/clusopt

how to use Streamkm++ for text clustring

miladfa7 opened this issue · 0 comments

i run streamSKM++ using clusopt
code:

from clusopt_core.cluster import Streamkm
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt

k = 32
dataset, _ = make_blobs(n_samples=64000, centers=k,
                  random_state=100,  cluster_std=0.1)

batchs = np.split(dataset, len(dataset) / 4000)

model = Streamkm(coresetsize= k *10, length=64000, seed=1)

for batch in batchs:
      model.partial_fit(batch)

clusters, _ = model.get_final_clusters(k, seed=42)

plt.scatter(*dataset.T, marker=",", label="datapoints")

plt.scatter(*model.get_streaming_coreset_centers().T, marker=".", label="microclusters")

plt.scatter(*clusters.T, marker="x", label="macro clusters", color="black")

plt.legend()
plt.show()

how to use streamSKM++ for text clustering. please help me