facebookresearch/dino

A question about DINOHead

wenhaoli-xmu opened this issue · 2 comments

背景(Background)

文章提到,随着DINOhead中MLP层数的增加,模型的效果会变得更好。
The article mentioned that as the number of MLP layers in DINOhead increases, the effect of the model will become better.

而使用l2 normalization是为了让MLP层数增加的同时更加稳定。
The use of l2 normalization is to make the MLP layer more stable while increasing the number of layers.

问题(Question)

  1. 问题1:那么跟在 l2 normalization 后面的 linear projection 是干什么用的?
    Question 1: So what is the linear projection following l2 normalization for?

  2. 问题2:为什么这个 linear projection 要使用 weight normalization
    Question 2: Why does this linear projection use weight normalization

image

ttkxyy commented

您弄清楚出了吗?我也有同样的疑惑

I may be wrong but I will try to answer the questions.
Q1:
Make the output K-dimension. A K-dimsional embedding is used for computing the loss in the original work.

Q2:
Not sure, but according the ref No. 61 of the paper, may be simply to make the network to be trained...faster?