Unable to train T2T-ViT for 384 x 384 image
SK124 opened this issue · 1 comments
Hi! Can you suggest what part of the code should be modified to prevent the following error? Also, Can i train images on my own input dimensions like 448 or 608?
from models.t2t_vit import *
model = T2t_vit_14()
RuntimeError Traceback (most recent call last)
in ()
1 inp=torch.rand(2,3,384,384)
----> 2 out=model(inp)
3 out.shape
2 frames
/content/T2T-ViT/models/t2t_vit.py in forward_features(self, x)
159 cls_tokens = self.cls_token.expand(B, -1, -1)
160 x = torch.cat((cls_tokens, x), dim=1)
--> 161 x = x + self.pos_embed
162 x = self.pos_drop(x)
RuntimeError: The size of tensor a (577) must match the size of tensor b (197) at non-singleton dimension 1
If you want to train our model with other image size like 384x384, please use:
from models.t2t_vit import *
model = T2t_vit_14(img_size=384)