DiNOV2 with Croco

Question

DiNOV2 with Croco

Closed this issue a month ago · 2 comments

Hi, thanks for sharing your great work.
Is there a checkpoint available for DiNO as the the backbone for cross-view completion and later on fine tuned for monocular downstream task?
If not, do you think it is possible and what's the best procedure to do so?

Thanks.

Answer 1 · 2024-05-28T12:53:22.000Z

Hi,

The encoder of CroCo being a standard ViT, it would be indeed possible to train a model for cross-view completion with a frozen/adapted/finetuned dinov2 encoder.

We had actually done some tries at some point (without keeping these experiments) and if I remember correctly, we had gains for monocular tasks, but performances on binocular geometric tasks were worse.

Best,
Philippe

Answer 2 · 2024-05-28T21:36:59.000Z

I see. Is it possible if you can share the code for pretraining DiNO for cross view completion (or the checkpoint)? I am trying to use the DiNO for my other monocular downstream task and I think cross-view completion will aid the results.