Supervised training doesn't help that much for extracting salient representations it seems?
sayakpaul opened this issue · 0 comments
sayakpaul commented
In the DINO blog post, the authors show the following:
This is what they say in the video caption:
The original video is shown on the left. In the middle is a segmentation example generated by a supervised model, and on the right is one generated by DINO. (All examples are licensed from Stock.)
We see that the attention maps generated with the supervised pre-trained model aren't that salient w.r.t the DINO model.
Seems to be verified:
Here's the Colab Notebook that verified it. The notebook is not formatted (be aware).