any_ res is not truly any resolution
becauseofAI opened this issue · 5 comments
Resolution=[576, 960] is right, but [1152, 1920], [1088, 1920], [1024, 1024], [384, 384], etc. are all not feasible. Therefore, it is difficult for us to conduct multi-scale training.
Hi @becauseofAI could you please share more details ? faster_vit_any_res
can be initialized with any resolution including those you have mentioned. In fact, we just added a 384x384 model.
@ahatamiz I think @becauseofAI is working on Detection model. It seems that any_res model can be initialized with any-resolution, but after initialization, the input image size must be fixed. This makes it unable to train detection model with multi-scale.
Can you clarify?
Thank you @chuong98 for your comment. I understand.
The main (initial) motivation of faster_vit_any_res
is still for classification by enabling various input sizes, window sizes, etc. Detection models (with multi-scale training) follow the same model but need some modifications to make it work. We will publish the code for downstream tasks (e.g. detection) soon.
Again, thank you @becauseofAI and @chuong98 for your interest in our work.
@ahatamiz I think @becauseofAI is working on Detection model. It seems that any_res model can be initialized with any-resolution, but after initialization, the input image size must be fixed. This makes it unable to train detection model with multi-scale. Can you clarify?
That's exactly what it means. Thank you for your further explanation. @chuong98
Hi @becauseofAI
We have released the object detection repository for FasterViT with DINO. You can use multi-scale training as well. The FasterViT backbone allows any resolution for training.
We will release the pretrained checkpoints very soon.
I hope you find this release to be useful !