MegEngine/MegDiffusion

About padding in Downsample

Closed this issue · 3 comments

I'm willing to upload my convert codes, but it doesn't work well after converting.
The error between megengine and pytorch implementation are high with the same input.
Because of the padding of convolution in Downsample are different, which in pytorch implementation it uses asymmetric padding.
Atfter I modified the megengine implmetation, the result:

class DownSample(M.Module):
    """"A downsampling layer with an optional convolution.

    Args:
        in_ch: channels in the inputs and outputs.
        use_conv: if ``True``, apply convolution to do downsampling; otherwise use pooling.
    """""

    def __init__(self, in_ch, with_conv=True):
        super().__init__()
        self.with_conv = with_conv
        if with_conv:
            self.main = M.Conv2d(in_ch, in_ch, 3, stride=2)
        else:
            self.main = M.AvgPool2d(2, stride=2)

    def _initialize(self):
        for module in self.modules():
            if isinstance(module, M.Conv2d):
                init.xavier_uniform_(module.weight)
                init.zeros_(module.bias)

    def forward(self, x, temb):  # add unused temb param here just for convince
        if self.with_conv:
            x = F.nn.pad(x, [*[(0, 0)
                         for i in range(x.ndim - 2)], (0, 1), (0, 1)])
        return self.main(x)

image

Btw, I'm also a beginner in ddpm, your blog helps me a lot!

Originally posted by @Asthestarsfalll in #5 (comment)

@Asthestarsfalll pesser‘s repo shows how to do the conversion from tf to torch in convert.py. But it's inference steps are not verified. Actually, This model is the source code I refer to and asymmetric padding is not used there. And I do not find asymmetric padding logic in Ho's implementation(tensorflow).

It's so confusing. Can you understand the author's reason for doing this?

You are right. In Tensorflow design, Conv2d's padding behavior is different. I will fix it soon.

Fixed in ddpm model.