Fine-tuning dino v2 for semantic segmentation task on MSCOCO.
- dinov2-vitb/14 as backbone
-
Linear layer + conv layer
-
variation of linear tuning (refer to section 7.4 of Dinov2)
RTX3090*2 with batch size of 8.
Prediction
GT
Fine-tuning dino v2 for semantic segmentation task on MSCOCO.
Linear layer + conv layer
variation of linear tuning (refer to section 7.4 of Dinov2)
RTX3090*2 with batch size of 8.
Prediction
GT