davidhallac/TICC

Using TICC for online clustering

Closed this issue · 6 comments

Hi David,

Can we use TICC for online clustering time series? For instance, we want to identify the state the car is in during driving, given a set of learned states using TICC for batch learning.

Thanks,

That is definitely a potential application of TICC. That aligns nicely with #18, where the goal is to separate out "fit" and "predict", so that you can train a model on one dataset, then infer the resulting clusters on another dataset.

We have modified the existing TICC code to support "fit", but we have not yet added the "predict" capability (though we would definitely welcome any support, if you're interested in contributing!).

So, overall: TICC can definitely be used for online clustering, but the existing code base does not yet support that functionality.

Hi! The new predict_clusters method now supports streaming settings. Hope that helps!

Thanks David, I'll test it out. Does it need to be retrained occasionally?

It doesn't need to be retrained, but ideally you would retrain it every once in a while if you want the most accurate estimate possible. This is because "predict_clusters" simply assigns clusters to the new points, and does not go back to update the cluster parameters, so you'd want to re-train it if you prefer to incorporate these new points into your model.

Thanks David, the streaming prediction works well. Regarding to re-train issue, my impression is the training process is not cumulative, is it true? if it is true then retrain will mean adding new data points to historical datasets and train them from the ground up. Is it possible to make training cumulative if it isn't now?

The training is not currently cumulative, since due to the specifics of the algorithm, it is not possible to run the M-step of TICC in an "incremental" way. In particular, each new point affects the cluster's empirical covariance, but then you need to use that empirical covariance to re-solve a new Toeplitz Graphical Lasso problem every time (see section 4.2 of the original paper for details). You'd need do a new eigendecomposition (equation 6 in the paper) every time you re-trained the cluster parameters, regardless of whether you started from scratch or solved it in a streaming setting, so unfortunately there is little benefit to adding that capability...

Perhaps once way of "incrementally" running it is to only update the clusters that have new points assigned to it, but it would not be cumulative, as you'd still need to start that update from scratch.