naver/croco

DiNOV2 with Croco

Closed this issue · 2 comments

Hi, thanks for sharing your great work.
Is there a checkpoint available for DiNO as the the backbone for cross-view completion and later on fine tuned for monocular downstream task?
If not, do you think it is possible and what's the best procedure to do so?

Thanks.

Hi,

The encoder of CroCo being a standard ViT, it would be indeed possible to train a model for cross-view completion with a frozen/adapted/finetuned dinov2 encoder.

We had actually done some tries at some point (without keeping these experiments) and if I remember correctly, we had gains for monocular tasks, but performances on binocular geometric tasks were worse.

Best,
Philippe

I see. Is it possible if you can share the code for pretraining DiNO for cross view completion (or the checkpoint)? I am trying to use the DiNO for my other monocular downstream task and I think cross-view completion will aid the results.