TokenInitializer for any resolution

Question

TokenInitializer for any resolution

becauseofAI opened this issue 2 years ago · 2 comments

When we create a model of 224x224, we can no longer input a image of 1024x1024 or 600x800.
How does TokenInitializer of carrier token support any resolution for multi-scale training tasks, such as detection, segmentation?

Answer 1 · 2023-08-14T16:45:51.000Z

Hi @becauseofAI , we can still initialize a model trained on 224x224 for a larger resolution like 1024x1024 as done in faster_vit_any_res. Currently, our pipi package allows for this to bring ImageNet-1K pretrained weights for any resolution. In addition, the user has the option of running HAT or sticking to previous configurations for 224x224 model by leveraging hat flag.

From a broader perspective, the idea is to incorporate carrier tokens whenever input_size>window_size. Here input_size is the stage-wise resolution. However, this can be bypassed by the aforementioned flag.

I will add code for downstream applications such as detection soon.

Thanks again for your interest in our work !

Answer 2 · 2023-10-14T20:35:24.000Z

Hi @becauseofAI

TokenInitalizer indeed allows for any resolution images (i.e. multiscale training) for downstream tasks such as detection and segmentation in the newly released detection repository.

I hope you find this to be useful !

P.S: Looking forward to see if you have successfully employed FasterViT in your applications of interest and achieved any promising results.