TokenInitializer for any resolution
becauseofAI opened this issue · 2 comments
When we create a model of 224x224, we can no longer input a image of 1024x1024 or 600x800.
How does TokenInitializer of carrier token support any resolution for multi-scale training tasks, such as detection, segmentation?
Hi @becauseofAI , we can still initialize a model trained on 224x224 for a larger resolution like 1024x1024 as done in faster_vit_any_res. Currently, our pipi package allows for this to bring ImageNet-1K pretrained weights for any resolution. In addition, the user has the option of running HAT or sticking to previous configurations for 224x224 model by leveraging hat flag.
From a broader perspective, the idea is to incorporate carrier tokens whenever input_size>window_size
. Here input_size is the stage-wise resolution. However, this can be bypassed by the aforementioned flag.
I will add code for downstream applications such as detection soon.
Thanks again for your interest in our work !
Hi @becauseofAI
TokenInitalizer indeed allows for any resolution images (i.e. multiscale training) for downstream tasks such as detection and segmentation in the newly released detection repository.
I hope you find this to be useful !
P.S: Looking forward to see if you have successfully employed FasterViT in your applications of interest and achieved any promising results.