About padding in Downsample

Question

About padding in Downsample

Closed this issue 2 years ago · 3 comments

I'm willing to upload my convert codes, but it doesn't work well after converting.
The error between megengine and pytorch implementation are high with the same input.
Because of the padding of convolution in Downsample are different, which in pytorch implementation it uses asymmetric padding.
Atfter I modified the megengine implmetation, the result:

class DownSample(M.Module):
    """"A downsampling layer with an optional convolution.

    Args:
        in_ch: channels in the inputs and outputs.
        use_conv: if ``True``, apply convolution to do downsampling; otherwise use pooling.
    """""

    def __init__(self, in_ch, with_conv=True):
        super().__init__()
        self.with_conv = with_conv
        if with_conv:
            self.main = M.Conv2d(in_ch, in_ch, 3, stride=2)
        else:
            self.main = M.AvgPool2d(2, stride=2)

    def _initialize(self):
        for module in self.modules():
            if isinstance(module, M.Conv2d):
                init.xavier_uniform_(module.weight)
                init.zeros_(module.bias)

    def forward(self, x, temb):  # add unused temb param here just for convince
        if self.with_conv:
            x = F.nn.pad(x, [*[(0, 0)
                         for i in range(x.ndim - 2)], (0, 1), (0, 1)])
        return self.main(x)

Btw, I'm also a beginner in ddpm, your blog helps me a lot!

Originally posted by @Asthestarsfalll in #5 (comment)

Answer 1 · 2022-07-25T05:33:08.000Z

@Asthestarsfalll pesser‘s repo shows how to do the conversion from tf to torch in convert.py. But it's inference steps are not verified. Actually, This model is the source code I refer to and asymmetric padding is not used there. And I do not find asymmetric padding logic in Ho's implementation(tensorflow).

It's so confusing. Can you understand the author's reason for doing this?

Answer 2 · 2022-07-25T09:42:55.000Z

You are right. In Tensorflow design, Conv2d's padding behavior is different. I will fix it soon.

Answer 3 · 2022-07-25T11:03:07.000Z

Fixed in ddpm model.