eps for GroupNorm
Asthestarsfalll opened this issue · 5 comments
Great work!
The paramter 'eps' in group norm will be initialized to 1e-5 by default.
However, the group norm in TensorFlow has a little diference, which is initialized with 1e-6.
Maybe it doesn't have any influence on training results, but can you just modify this(for all GroupNorm in code) for aligning?
Because I want to convert the trained model from torch or tf to megengine, the less the error is, the better it is.
Thanks for your watching! The DDPM model was based on some Pytorch code implementation at first and I'm glad to hear that you are willing to convert original pre-trained model to MegEngine. Here are some information might be helpful:
- I only checked that the forward process is consistent with the Pytorch version I referenced, but not sure if all details of the original Tensorflow version are implemented.
- Other converted ckpts: https://github.com/pesser/pytorch_diffusion
In my opinion, converting scripts are also important for users to understand how converted pre-trained models come from. So I sugguest you upload them into this repo, which could encourage more users join us.
Btw, I'm not sure yet how to develop this library in the future, I hope it will help more people understand the implementation of diffusion models. (OpenAI's improved/guided codebase is great, but lack of readability.)
During developing this repo, I write some notes in Chinese for myself to understand more about diffusion models. Here is a post: https://meg.chai.ac.cn/ddpm-megengine/ Welcome to read it and give me some advice.
I'm willing to upload my convert codes, but it doesn't work well after converting.
The error between megengine and pytorch implementation are high with the same input.
Because of the padding of convolution in Downsample are different, which in pytorch implementation it uses asymmetric padding.
Atfter I modified the megengine implmetation, the result:
class DownSample(M.Module):
""""A downsampling layer with an optional convolution.
Args:
in_ch: channels in the inputs and outputs.
use_conv: if ``True``, apply convolution to do downsampling; otherwise use pooling.
"""""
def __init__(self, in_ch, with_conv=True):
super().__init__()
self.with_conv = with_conv
if with_conv:
self.main = M.Conv2d(in_ch, in_ch, 3, stride=2)
else:
self.main = M.AvgPool2d(2, stride=2)
def _initialize(self):
for module in self.modules():
if isinstance(module, M.Conv2d):
init.xavier_uniform_(module.weight)
init.zeros_(module.bias)
def forward(self, x, temb): # add unused temb param here just for convince
if self.with_conv:
x = F.nn.pad(x, [*[(0, 0)
for i in range(x.ndim - 2)], (0, 1), (0, 1)])
return self.main(x)
Btw, I'm also a beginner in ddpm, your blog helps me a lot!
Got it. I'm not available at the moment and I will check the padding mode and #6 after day off.
The initial eps value has been updated and I will close this issue now to keep tracking the same thing in one issue.
Feel free to reopen it if you have any questions or suggestion.