Post-hoc ema?
elyxlz opened this issue · 11 comments
From Karras' recent paper:
https://arxiv.org/abs/2312.02696
Allows you to set the EMA parameters after training is completed.
nice, love his papers. I'll give it a read later
what's the tldr for the new EMA technique? I dont understand your description
He makes two contributions actually,
The simpler one is just modifying the decay such that Beta is a function of the training time step, this is such to make the averaging profile scale automatically with training time.
Then there is the post-hoc one. The idea is that you can pick gamma or beta after training is done without having to rerun a training. In the paper he shows that performance is surprisingly sensitive to these parameters and there isn't some universal heuristic with which to pick them.
As for how it works AFAICT you keep two running EMAs with different profiles at the same time and periodically snapshot the parameters (e.g. every 5k steps). Then at the end of training you pick what profile you want and you run a least squares fit between your two EMA runs to find with what weights you need to interpolate your two runs to achieve the desired profile.
I still haven't understood it perfectly, however
It seems like someone recently managed to make an implementation:
https://github.com/cloneofsimo/karras-power-ema-tutorial
@elyxlz i see
yea, the first can def be added quite easily to the repo
the second one will require more thought. need to also read the paper first to see if the improvements are significant enough to warrant building it out
oh man, this paper is real good. so many juicy findings, and successful use of cosine sim attention!
I think I'll build out this post hoc EMA
notice that Karras states that he plans to release both implementation and trained models
@elyxlz oh, this makes it easy 🤣
completely skipped over the reference implementation 🤣. Sounds awesome btw, can't wait!
@elyxlz how does the following API look? https://github.com/lucidrains/ema-pytorch/pull/17/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R54
ok it is done
it looks awesome! excited to try it out