NVlabs/FasterViT

different resolution between train and inference

d-zhou12 opened this issue · 4 comments

Thanks for your nice work, I want to know if it is possible to train with a low resolution and inference with a high resolution by fasterViT? I just test create_model from any_res faster vit in resolution 256x256 and inference with other resolution 512x512 would meet this error: RuntimeError: shape '[-1, 2, 2, 4, 4, 512]' is invalid for input of size 73728

Hi @d-zhou12 , what are the window sizes being used ? the height and width in both cases should ideally be divisible by the window size.

I use default setting [7, 7, 12, 6] window size by default in readme.md, and I tried [8, 8, 8, 8] it still have error for 256x256/ 512x512

Hi @d-zhou12 , would you please provide the log (maybe for the second case where you use the same window size) and also confirm the timm and torchvision version, please ?

Thanks

Closing this issue for now until logs could be provided.