tslearn-team/tslearn

Cluster Centers are not updating after assigning init

Barathwaja opened this issue · 1 comments

Describe the bug
Hi I'm trying to set the Cluster_centers_ through init argument and after FIT it's recomputed and setting for that dataset. How to know if it really uses that base and setting up or not.

To Reproduce
Code

init_data = np.array([[[1040.9555],
                      [1037.463],
                      [1034.8087],
                      [1031.3035]]])
    
model = TimeSeriesKMeans(n_clusters=1,
                             verbose=False,
                             metric='euclidean', 
                             random_state=2, init=init_data)

print(model.__dict__)

print("After FIT")
model.fit(X)

print(model.__dict__)

Results

{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
        [1037.463 ],
        [1034.8087],
        [1031.3035]]])}
After FIT
/Users/beast/opt/anaconda3/lib/python3.9/site-packages/tslearn/utils/utils.py:90: UserWarning: 2-Dimensional data passed. Assuming these are 8 1-dimensional timeseries
  warnings.warn(
{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
        [1037.463 ],
        [1034.8087],
        [1031.3035]]]), 'labels_': array([0, 0, 0, 0, 0, 0, 0, 0]), 'inertia_': 37706.06962299204, 'cluster_centers_': array([[[1033.4625   ],
        [1007.5545   ],
        [ 966.0016875],
        [ 926.6316875]]]),

Hello @Barathwaja,
When you initialize the class TimeSeriesKMeans with an init input parameter equal to an ndarray, this parameter is stored and is accessible via init attribute (in your case model.init).
When you use the fit method on a dataset, the init parameter is left unchanged.
The k-means algorithm is initialized using the init ndarray.
Then after running the k-means algorithm, the final positions of the clusters centers are stored in the cluster_centers_ attribute.
In your case, you can access the cluster centers via model.cluster_centers_.
If you want to predict the label of a new point, the attribute cluster_centers_ will be used.
If you want to fit your model on a new dataset, the attribute init will be used.

I am not sure to understand what you are willing to do.
If you want to update your init parameter using your final cluster centers positions, you can use:
model.init = model.cluster_centers_
If you want to control the value of the cluster centers, you can use:
model.cluster_centers_ = cluster_centers where cluster_centers is an ndarray of shape (n_clusters, sz, d).

I hope this helps!