NVlabs/FasterViT

any_ res is not truly any resolution

becauseofAI opened this issue · 5 comments

Resolution=[576, 960] is right, but [1152, 1920], [1088, 1920], [1024, 1024], [384, 384], etc. are all not feasible. Therefore, it is difficult for us to conduct multi-scale training.

Hi @becauseofAI could you please share more details ? faster_vit_any_res can be initialized with any resolution including those you have mentioned. In fact, we just added a 384x384 model.

@ahatamiz I think @becauseofAI is working on Detection model. It seems that any_res model can be initialized with any-resolution, but after initialization, the input image size must be fixed. This makes it unable to train detection model with multi-scale.
Can you clarify?

Thank you @chuong98 for your comment. I understand.

The main (initial) motivation of faster_vit_any_res is still for classification by enabling various input sizes, window sizes, etc. Detection models (with multi-scale training) follow the same model but need some modifications to make it work. We will publish the code for downstream tasks (e.g. detection) soon.

Again, thank you @becauseofAI and @chuong98 for your interest in our work.

@ahatamiz I think @becauseofAI is working on Detection model. It seems that any_res model can be initialized with any-resolution, but after initialization, the input image size must be fixed. This makes it unable to train detection model with multi-scale. Can you clarify?

That's exactly what it means. Thank you for your further explanation. @chuong98

Hi @becauseofAI

We have released the object detection repository for FasterViT with DINO. You can use multi-scale training as well. The FasterViT backbone allows any resolution for training.

We will release the pretrained checkpoints very soon.

I hope you find this release to be useful !