shiming-chen/MSDN

A question about ablation studies in Table 2.

Closed this issue · 2 comments

Hi,

I have a question about the ablation studies in Table 2.

Why can a single subnet, e.g. MSDN (A->V) w/ L_distill, be trained with semantic distillation loss that calculates the difference between two subnet outputs?

Looking forward to your reply soon.

Hi, @bad-meets-joke

In Table 2, MSDN(A->V) w/L_distill means that the MSDN trained with L_distill loss, but we only take the features in A->V branch for classification. Please refer to the related descirptions in the paper.

Best!

Thanks for your reply. Now I get it.