Allow deterministic k-means/k-medoids
Closed this issue · 4 comments
Allow to set random_state
parameter of scikit's cluster methods to make TSA reproducible and deterministic (e.g., random_state=0
)
@maximilian-hoffmann Isn't the predefinition of the seed already implemented?
if clusterMethod == "k_means":
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=n_clusters, max_iter=1000, n_init=n_iter, tol=1e-4)
(l. 63, tsam/tsam/periodAggregation.py
)
I can't seem to find it in the source code, and I remember that you highlighted the deterministic behavior of hierarchical clustering in the paper as something unique to this clustering method, though, kmeans et al. allow for that as well with a set seed
Of course, you can set the seed and the k-means algorithm will become reproduce-able. Still, in its origin, it is dependent on a randomized placement of starting points which is not the case for the hierarchical aggregation.
I would be quite happy if one would implement the definition of the seed as argument and make a pull request. :)
Since it does not seem crucial, I will close this issue. In case someone wants to save the seeds and reuse them, feel free to open this issue again.