Running mean and running var use EMA?
zhaohui-yang opened this issue · 0 comments
zhaohui-yang commented
Thank you for your selfless sharing. In the author's application, EMA has been applied to all parameters that can be trained, but there are some statistics, such as the mean and variance of BN, that are not training parameters. When ema is copied back, this part of the statistics is adapted to the original parameters, not to the parameters after ema. I would like to ask if the author has ever experimented with ema for all state_dicts()? What is the experimental effect of smoothing variables in dict, not just trainable parameters?