facebookresearch/MaskFormer

Question about Pixel decoder last Conv2d layer

YaGami01 opened this issue · 0 comments

Great work!
Your paper said that “Finally, we apply a single 1 × 1 convolution layer to get the per-pixel embeddings. ” in 4.1 Implementation details Pixel decoder.
However,the codes in pixel_decoder.py ,

self.mask_features = Conv2d(
            conv_dim,
            mask_dim,
            kernel_size=3,
            stride=1,
            padding=1,
        )

the final conv2d also has a 3*3 kernel, do I miss something?
Thanks!