Bayesian online changepoint detection for multivariate data
sokolov-alex opened this issue · 16 comments
Is it possible to make Ryan Adams algorithm to work on multivariate data too?
As far as I can see, the change should be relatively easy, but a bit time consuming. It's only updating the student t distribution to handle multivariate data correctly. It is not yet in scipy, though, AFAIK, so it would need a bit of time to get it in there.
Would you feel confident enough to translate e.g. the wiki article on multivariate student t distributions to a scipy.stats.multivariate_t
?
I have found a multivariate student t distribution implementation, but I didn't yet find out how to change the update_theta function to handle multivariate data.
You would want to mimic numpy.random.multivariate_normal as that is the main call from scipy.stats.multivariate_normal (the rest is type and size checking). If we just want a local version, that should be fine.
Hi @sokolov-alex : Did you find a way to update the theta function?
Depending on your goal it might be good enough to implement a multivariate T, which assumes the input variables to be independent. Though this will not capture a change in covariance structure, it might still be useful depending on what you are looking for. On top of that the implementation of this is rather simple.
Hello friends, as I can see: Modeling Changing Dependency Structure in
Multivariate Time Series
So you already added this functionality to detect a change in multivariate time series?
it would great to have it.
Let's say I have 9 time series and small change happen in which of them. This change is not enough to detect by analyzing each one individually, but some aggregation may help?
Or 5 of them changed and 4 not, so still change will be detected with some probability ?
Hi all!
Are there any changes?
Good question
I note that, as of this month, scipy now includes an implementation of the PDF for the multivariate t-distribution: https://docs.scipy.org/doc/scipy/reference/release.1.6.0.html#scipy-stats-improvements.
Does that make this easy to implement?
I've also found an R implementation of this algorithm that works on multivariate data, which I think could be used as a meaningful reference. In particular, their "update theta" function is here: <deleted as from GPLv3 licensed code>
I reckon I can give this a shot. Is it okay to reference a book or paper for the maths without breaking the MIT license?
Draft PR in #26. Happy if anyone with a better grasp of the stats runs an eye over my code.
Can we now close this issue as #26 has been merged?