Cluster Centers are not updating after assigning init
Barathwaja opened this issue · 1 comments
Describe the bug
Hi I'm trying to set the Cluster_centers_ through init argument and after FIT it's recomputed and setting for that dataset. How to know if it really uses that base and setting up or not.
To Reproduce
Code
init_data = np.array([[[1040.9555],
[1037.463],
[1034.8087],
[1031.3035]]])
model = TimeSeriesKMeans(n_clusters=1,
verbose=False,
metric='euclidean',
random_state=2, init=init_data)
print(model.__dict__)
print("After FIT")
model.fit(X)
print(model.__dict__)
Results
{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
[1037.463 ],
[1034.8087],
[1031.3035]]])}
After FIT
/Users/beast/opt/anaconda3/lib/python3.9/site-packages/tslearn/utils/utils.py:90: UserWarning: 2-Dimensional data passed. Assuming these are 8 1-dimensional timeseries
warnings.warn(
{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
[1037.463 ],
[1034.8087],
[1031.3035]]]), 'labels_': array([0, 0, 0, 0, 0, 0, 0, 0]), 'inertia_': 37706.06962299204, 'cluster_centers_': array([[[1033.4625 ],
[1007.5545 ],
[ 966.0016875],
[ 926.6316875]]]),
Hello @Barathwaja,
When you initialize the class TimeSeriesKMeans
with an init
input parameter equal to an ndarray, this parameter is stored and is accessible via init
attribute (in your case model.init
).
When you use the fit
method on a dataset, the init parameter is left unchanged.
The k-means algorithm is initialized using the init
ndarray.
Then after running the k-means algorithm, the final positions of the clusters centers are stored in the cluster_centers_
attribute.
In your case, you can access the cluster centers via model.cluster_centers_
.
If you want to predict the label of a new point, the attribute cluster_centers_
will be used.
If you want to fit your model on a new dataset, the attribute init
will be used.
I am not sure to understand what you are willing to do.
If you want to update your init
parameter using your final cluster centers positions, you can use:
model.init = model.cluster_centers_
If you want to control the value of the cluster centers, you can use:
model.cluster_centers_ = cluster_centers
where cluster_centers
is an ndarray of shape (n_clusters, sz, d)
.
I hope this helps!