Post-hoc ema?

Question

Post-hoc ema?

elyxlz opened this issue 8 months ago · 11 comments

elyxlz commented 8 months ago

From Karras' recent paper:
https://arxiv.org/abs/2312.02696

Allows you to set the EMA parameters after training is completed.

Answer 1 · 2024-01-06T19:50:47.000Z

nice, love his papers. I'll give it a read later

what's the tldr for the new EMA technique? I dont understand your description

Answer 2 · 2024-01-06T20:15:19.000Z

He makes two contributions actually,
The simpler one is just modifying the decay such that Beta is a function of the training time step, this is such to make the averaging profile scale automatically with training time.

Then there is the post-hoc one. The idea is that you can pick gamma or beta after training is done without having to rerun a training. In the paper he shows that performance is surprisingly sensitive to these parameters and there isn't some universal heuristic with which to pick them.
As for how it works AFAICT you keep two running EMAs with different profiles at the same time and periodically snapshot the parameters (e.g. every 5k steps). Then at the end of training you pick what profile you want and you run a least squares fit between your two EMA runs to find with what weights you need to interpolate your two runs to achieve the desired profile.

I still haven't understood it perfectly, however

Answer 3 · 2024-01-06T23:21:08.000Z

It seems like someone recently managed to make an implementation:
https://github.com/cloneofsimo/karras-power-ema-tutorial

Answer 4 · 2024-01-07T19:38:25.000Z

@elyxlz i see

yea, the first can def be added quite easily to the repo

the second one will require more thought. need to also read the paper first to see if the improvements are significant enough to warrant building it out

Answer 5 · 2024-01-09T03:11:00.000Z

oh man, this paper is real good. so many juicy findings, and successful use of cosine sim attention!

I think I'll build out this post hoc EMA

notice that Karras states that he plans to release both implementation and trained models

Answer 6 · 2024-01-09T03:39:58.000Z

@elyxlz oh, this makes it easy 🤣

Answer 7 · 2024-02-08T15:42:18.000Z

@elyxlz hey, going to knock out the post-hoc ema wrapper this morning

also have the magnitude preserving unet finished over here, just fyi

Answer 8 · 2024-02-08T16:23:12.000Z

completely skipped over the reference implementation 🤣. Sounds awesome btw, can't wait!

Answer 9 · 2024-02-08T18:09:28.000Z

@elyxlz how does the following API look? https://github.com/lucidrains/ema-pytorch/pull/17/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R54

Answer 10 · 2024-02-08T19:06:37.000Z

ok it is done

Answer 11 · 2024-02-08T22:46:45.000Z

it looks awesome! excited to try it out