Porting this to River
MaxHalford opened this issue · 14 comments
Hello there! I hope you're doing well.
I recently saw this repo pop up and I like it very much. There are some things that we don't yet have in River. In particular, I'm thinking of the OnlineEmpiricalCovariance
class.
Would it be ok if we ported some of this stuff into River? I rather ask a gentleman rather savagely copy the code.
Kind regards.
Hi @MaxHalford
Savagely copying is fine - though we may want to cross-fertilize our unit tests! Aside: the only reason I'm avoiding classes is due to quirks of my deployment - a desire to avoid object serialization issues.
With a little help, this old dog could probably be taught how to make useful PR's to river. I was thinking the same thing with some of the online timeseries stuff I'm doing, although I couldn't quite grok how to do k-step ahead things. If I understand correctly, k=1 step ahead is pretty straightforward in river and that's the focus of "precise", so it seems like a good idea to join forces there.
Remarks:
- Some of the portfolio stuff might help with river online ensembling/stacking/mixtures of experts, though first I'm just establishing the baselines with traditional setup.
- Some methods like online Ledoit-Wolf here are speculative. Up to you if you want to see how the Elo ratings pan out.
Peter
ps: don't forget to enter M6. I'd like to see some open-source devs win prizes!
With a little help, this old dog could probably be taught how to make useful PR's to river. I was thinking the same thing with some of the online timeseries stuff I'm doing, although I couldn't quite grok how to do k-step ahead things. If I understand correctly, k=1 step ahead is pretty straightforward in river and that's the focus of "precise", so it seems like a good idea to join forces there.
It would great if we could work something out. You definitely seem like you have strong coding abilities. The only thing is that River operates on dicts, not numpy arrays. We do use numpy arrays, but only for mini-batch updates. For instance see StandardScaler
.
I would say that including these methods in River, and participating in the project, would maybe allow to reach a wider audience. For instance, I know a few teams that would enjoy having an online covariance matrix for anomaly detection purposes.
Some methods like online Ledoit-Wolf here are speculative. Up to you if you want to see how the Elo ratings pan out.
It's good you point that out. We do try to focus on established methods, a bit like scikit-learn. We also have a river-extra repository for more "experimental" stuff.
ps: don't forget to enter M6. I'd like to see some open-source devs win prizes!
Yep it's on my list ;)
Makes sense. Perhaps if you create the basic running empirical online cov calculation, then it will be simple for me to PR a few others as they stabilize following your pattern.
Will do 👌
Ok I'm done, here it is. Let me know if you have any questions!
Nota bene: I have had on my todo list since far too long to into microprediction.com. It will get done at some point :)
Keep up the great work 🤝
Question for you because I'm blind and can't find it: do you have online formulas for the online precision matrix? That would enable many other algorithms, in particular Bayesian methods.
Hi @MaxHalford I somehow missed this thread, probably under 50000 system alerts killing my inbox.
Massively delayed answer to your question about precision - I don't yet have precision skaters but I think the method used by Lee and Zhong might be of interest: https://github.com/microprediction/precise/blob/main/precise/skaters/covariance/ewalzfactory.py
Re river. I would think my first PR would be something like expon weighted sample cov. Does that make sense?
Hi @MaxHalford I somehow missed this thread, probably under 50000 system alerts killing my inbox.
Don't apologize!
Massively delayed answer to your question about precision - I don't yet have precision skaters but I think the method used by Lee and Zhong might be of interest: https://github.com/microprediction/precise/blob/main/precise/skaters/covariance/ewalzfactory.py
Thanks, I'll take a look. My current thinking is that the Sherman-Morrison formula can be used.
Re river. I would think my first PR would be something like expon weighted sample cov. Does that make sense?
Sure, that would be most appreciated! We have added an online covariance matrix, which you can see here. Under the hood it simply orchestrates a bunch of Cov
s. My instinct would be to do the same with exponentially weighted covariances. But we don't have those yet! We only have expo weighted variances, see here.
bubbling this up . note to self
I've implemented the precision matrix, see here :)
Very nice !