"ZeroDivisionError: integer division or modulo by zero" when vit_patches_size=8
alqurri77 opened this issue · 3 comments
Hi;
I tried to test the vit_patches_size to 8 instead of 16, but got the below error (notice image size is 224):
Input In [25], in Transformer.__init__(self, config, img_size, vis)
257 def __init__(self, config, img_size, vis):
258 super(Transformer, self).__init__()
--> 259 self.embeddings = Embeddings(config, img_size=img_size)
260 self.encoder = Encoder(config, vis)
Input In [25], in Embeddings.__init__(self, config, img_size, in_channels)
140 patch_size = (img_size[0] // b_size // grid_size[0], img_size[1] // b_size // grid_size[1])
141 patch_size_real = (patch_size[0] * b_size, patch_size[1] * b_size)
--> 142 n_patches = (img_size[0] // patch_size_real[0]) * (img_size[1] // patch_size_real[1])
143 self.hybrid = True
144 else:
ZeroDivisionError: integer division or modulo by zero
这个值,只要不是16就会报这个错误,很奇怪
Hi;
I tried to test the vit_patches_size to 8 instead of 16, but got the below error (notice image size is 224):
Input In [25], in Transformer.__init__(self, config, img_size, vis) 257 def __init__(self, config, img_size, vis): 258 super(Transformer, self).__init__() --> 259 self.embeddings = Embeddings(config, img_size=img_size) 260 self.encoder = Encoder(config, vis) Input In [25], in Embeddings.__init__(self, config, img_size, in_channels) 140 patch_size = (img_size[0] // b_size // grid_size[0], img_size[1] // b_size // grid_size[1]) 141 patch_size_real = (patch_size[0] * b_size, patch_size[1] * b_size) --> 142 n_patches = (img_size[0] // patch_size_real[0]) * (img_size[1] // patch_size_real[1]) 143 self.hybrid = True 144 else: ZeroDivisionError: integer division or modulo by zero
May I ask if your image_size is 224 and patch_size is 16 can it run properly. I have the following error running on my side:
RuntimeError: Calculated padded input size per channel: (14 x 14). Kernel size: (16 x 16). Kernel size can't be greater than actual input size
I checked and found that after convolution processing in the encoder part, the feature map specification was (1024,14,14), ignoring the batch size. At this time, the size of the feature map is 14x14, which is not enough to support the ViT calculation with patch_size of 16. Is there something wrong with me?
这个值,只要不是16就会报这个错误,很奇怪
好兄弟,请问图像大小224,patch_size为16能正常运行吗?