Detection and segmentation models
pianogGG opened this issue · 8 comments
Hi , I saw in the catalog that you have plans to train models for detection and segmentation. I was wondering
- if you plan to modify the network structure for the detection task ?
- When do you plan to release the related models?
- I want to replace the Swin-T backbone in BevFusion's camera backbone with FasterVit and use existing ImageNet-1K pre-trained models. Do you think this would be a good solution?
Hi @pianogGG , yes we will release the detection code with a slightly modified architecture. Although the any-resolution FasterViT model can readily be used for this purpose at this stage as well.
There is no ETA currently for lease of these models but hopefully that should happen soon.
And it is certainly a great idea to use FasterViT in BEVFusion with counterpart FasterViT due to its clear advantage in both accuracy and throughput. There is some domain gap, but many papers have shown that ImageNet pre-trained models indeed prove useful.
Hi @ahatamiz ,"yes we will release the detection code with a slightly modified architecture"===> did you mean FastVit backbone or neck or head?
The FastVit backbone i mentioned refers to the part circle in the first box
Thanks for the question @pianogGG . Precisely, the neck as we remove the final classification head and only extract intermediate feature maps.
Hi @ahatamiz Thanks a lot. So "yes we will release the detection code with a slightly modified architecture", which part you changed?
Hi @pianogGG , the detection/segmentation code is now almost same as FasterViT_any_res model which supports non-square images (as often see in COCO, ADE20K, etc.). We only have to remove the linear head (as mentioned above), also extract the output of each stage.
Will close this issue for now.
Thanks for your inquiry @pianogGG ! Please see our newly release object detection repository and the FasterViT backbone !
We will add pretrained checkpoints very soon !