Using TICC for online clustering

Question

Using TICC for online clustering

Closed this issue 6 years ago · 6 comments

Hi David,

Can we use TICC for online clustering time series? For instance, we want to identify the state the car is in during driving, given a set of learned states using TICC for batch learning.

Thanks,

Answer 1 · 2018-04-23T22:50:41.000Z

That is definitely a potential application of TICC. That aligns nicely with #18, where the goal is to separate out "fit" and "predict", so that you can train a model on one dataset, then infer the resulting clusters on another dataset.

We have modified the existing TICC code to support "fit", but we have not yet added the "predict" capability (though we would definitely welcome any support, if you're interested in contributing!).

So, overall: TICC can definitely be used for online clustering, but the existing code base does not yet support that functionality.

Answer 2 · 2018-05-04T17:00:12.000Z

Hi! The new predict_clusters method now supports streaming settings. Hope that helps!

Answer 3 · 2018-05-07T09:13:47.000Z

Thanks David, I'll test it out. Does it need to be retrained occasionally?

Answer 4 · 2018-05-07T15:28:50.000Z

It doesn't need to be retrained, but ideally you would retrain it every once in a while if you want the most accurate estimate possible. This is because "predict_clusters" simply assigns clusters to the new points, and does not go back to update the cluster parameters, so you'd want to re-train it if you prefer to incorporate these new points into your model.

Answer 5 · 2018-05-07T18:27:54.000Z

Thanks David, the streaming prediction works well. Regarding to re-train issue, my impression is the training process is not cumulative, is it true? if it is true then retrain will mean adding new data points to historical datasets and train them from the ground up. Is it possible to make training cumulative if it isn't now?

Answer 6 · 2018-05-07T19:12:19.000Z

The training is not currently cumulative, since due to the specifics of the algorithm, it is not possible to run the M-step of TICC in an "incremental" way. In particular, each new point affects the cluster's empirical covariance, but then you need to use that empirical covariance to re-solve a new Toeplitz Graphical Lasso problem every time (see section 4.2 of the original paper for details). You'd need do a new eigendecomposition (equation 6 in the paper) every time you re-trained the cluster parameters, regardless of whether you started from scratch or solved it in a streaming setting, so unfortunately there is little benefit to adding that capability...

Perhaps once way of "incrementally" running it is to only update the clusters that have new points assigned to it, but it would not be cumulative, as you'd still need to start that update from scratch.