lucidrains/ema-pytorch

Saving EMA also saves online model?

Netruk44 opened this issue · 4 comments

It looks like if you call state_dict on the EMA model to save it, you get both the online_model as well as the ema_model weights. When saving to disk, this effectively doubles the size of the EMA file. Does the online_model need to be saved alongside the moving average weights?

In addition, when reconstructing and loading the EMA class with load_state_dict, the online_model weights (which points to a separate model outside of the EMA class) will be overwritten with what's saved in the EMA. This is probably fine if you always save and load both EMA and live weights at the same time, but may cause issues if somebody saves and loads the EMA class separately from the live model.

(For context, I'm looking at code using this repository from your muse-maskgit-pytorch repository. Maybe usage of EMA elsewhere expects the behavior I described?)

@Netruk44 oh hey Daniel! this is a very good point

do you want to try setting this to False and see if that solves your issue?

Thanks for addressing it! Unfortunately I'm not sure I can test this change for a while. I don't really have a use for this library currently, I only stumbled on it when I was checking out the muse-maskgit-pytorch repository, which was (and I assume still is) in early development and wasn't quite working yet. I haven't really looked at it since my training attempt, which is when I noticed and created this issue.

I can circle back around to training another muse model eventually so I can test this change out, but it probably won't be for a few weeks at least. I'm in the middle of training a different model, so my one and only GPU is occupied 😛.

Looking at the change, it seems like it'll do the trick. We could probably just call this issue closed, I can open another issue if I find another problem.

haha sounds good, yea just reopen once you end up trying it!

@Netruk44 huggingface has expressed interest in training Muse using my repository. you should get in touch with them if you are interested as well and need compute to run the experiments