Beckschen/TransUNet

"ZeroDivisionError: integer division or modulo by zero" when vit_patches_size=8

alqurri77 opened this issue · 3 comments

Hi;

I tried to test the vit_patches_size to 8 instead of 16, but got the below error (notice image size is 224):

Input In [25], in Transformer.__init__(self, config, img_size, vis)
    257 def __init__(self, config, img_size, vis):
    258     super(Transformer, self).__init__()
--> 259     self.embeddings = Embeddings(config, img_size=img_size)
    260     self.encoder = Encoder(config, vis)

Input In [25], in Embeddings.__init__(self, config, img_size, in_channels)
    140     patch_size = (img_size[0] // b_size // grid_size[0], img_size[1] // b_size // grid_size[1])
    141     patch_size_real = (patch_size[0] * b_size, patch_size[1] * b_size)
--> 142     n_patches = (img_size[0] // patch_size_real[0]) * (img_size[1] // patch_size_real[1])  
    143     self.hybrid = True
    144 else:

ZeroDivisionError: integer division or modulo by zero
hczyni commented

这个值,只要不是16就会报这个错误,很奇怪

Hi;

I tried to test the vit_patches_size to 8 instead of 16, but got the below error (notice image size is 224):

Input In [25], in Transformer.__init__(self, config, img_size, vis)
    257 def __init__(self, config, img_size, vis):
    258     super(Transformer, self).__init__()
--> 259     self.embeddings = Embeddings(config, img_size=img_size)
    260     self.encoder = Encoder(config, vis)

Input In [25], in Embeddings.__init__(self, config, img_size, in_channels)
    140     patch_size = (img_size[0] // b_size // grid_size[0], img_size[1] // b_size // grid_size[1])
    141     patch_size_real = (patch_size[0] * b_size, patch_size[1] * b_size)
--> 142     n_patches = (img_size[0] // patch_size_real[0]) * (img_size[1] // patch_size_real[1])  
    143     self.hybrid = True
    144 else:

ZeroDivisionError: integer division or modulo by zero

May I ask if your image_size is 224 and patch_size is 16 can it run properly. I have the following error running on my side:

RuntimeError: Calculated padded input size per channel: (14 x 14). Kernel size: (16 x 16). Kernel size can't be greater than actual input size

I checked and found that after convolution processing in the encoder part, the feature map specification was (1024,14,14), ignoring the batch size. At this time, the size of the feature map is 14x14, which is not enough to support the ViT calculation with patch_size of 16. Is there something wrong with me?

这个值,只要不是16就会报这个错误,很奇怪

好兄弟,请问图像大小224,patch_size为16能正常运行吗?