lucidrains/ema-pytorch

Post-hoc ema?

elyxlz opened this issue · 11 comments

From Karras' recent paper:
https://arxiv.org/abs/2312.02696

Allows you to set the EMA parameters after training is completed.

nice, love his papers. I'll give it a read later

what's the tldr for the new EMA technique? I dont understand your description

He makes two contributions actually,
The simpler one is just modifying the decay such that Beta is a function of the training time step, this is such to make the averaging profile scale automatically with training time.
image

Then there is the post-hoc one. The idea is that you can pick gamma or beta after training is done without having to rerun a training. In the paper he shows that performance is surprisingly sensitive to these parameters and there isn't some universal heuristic with which to pick them.
As for how it works AFAICT you keep two running EMAs with different profiles at the same time and periodically snapshot the parameters (e.g. every 5k steps). Then at the end of training you pick what profile you want and you run a least squares fit between your two EMA runs to find with what weights you need to interpolate your two runs to achieve the desired profile.
image

I still haven't understood it perfectly, however

It seems like someone recently managed to make an implementation:
https://github.com/cloneofsimo/karras-power-ema-tutorial

@elyxlz i see

yea, the first can def be added quite easily to the repo

the second one will require more thought. need to also read the paper first to see if the improvements are significant enough to warrant building it out

oh man, this paper is real good. so many juicy findings, and successful use of cosine sim attention!

I think I'll build out this post hoc EMA

notice that Karras states that he plans to release both implementation and trained models

Screen Shot 2024-01-08 at 7 39 09 PM

@elyxlz oh, this makes it easy 🤣

@elyxlz hey, going to knock out the post-hoc ema wrapper this morning

also have the magnitude preserving unet finished over here, just fyi

completely skipped over the reference implementation 🤣. Sounds awesome btw, can't wait!

ok it is done

it looks awesome! excited to try it out