Unable to train T2T-ViT for 384 x 384 image

Question

Unable to train T2T-ViT for 384 x 384 image

SK124 opened this issue 4 years ago · 1 comments

Hi! Can you suggest what part of the code should be modified to prevent the following error? Also, Can i train images on my own input dimensions like 448 or 608?

from models.t2t_vit import *
model = T2t_vit_14()
inp=torch.rand(2,3,384,384)
out=model(inp)
out.shape

RuntimeError Traceback (most recent call last)
in ()
1 inp=torch.rand(2,3,384,384)
----> 2 out=model(inp)
3 out.shape

2 frames
/content/T2T-ViT/models/t2t_vit.py in forward_features(self, x)
159 cls_tokens = self.cls_token.expand(B, -1, -1)
160 x = torch.cat((cls_tokens, x), dim=1)
--> 161 x = x + self.pos_embed
162 x = self.pos_drop(x)
163

RuntimeError: The size of tensor a (577) must match the size of tensor b (197) at non-singleton dimension 1

Answer 1 · 2021-05-16T07:20:10.000Z

Hi,

If you want to train our model with other image size like 384x384, please use:

from models.t2t_vit import *
model = T2t_vit_14(img_size=384)

from models.t2t_vit import * model = T2t_vit_14() inp=torch.rand(2,3,384,384) out=model(inp) out.shape

from models.t2t_vit import *
model = T2t_vit_14()
inp=torch.rand(2,3,384,384)
out=model(inp)
out.shape