OpenNLPLab/AVSBench

Question on loss AVM-AV

Closed this issue · 4 comments

Hi! The loss AVM-AV formula in the paper is hard to understand for me so I checked the source code. However, it seems to me that the code is quite different from what is presented in the paper. Could you please clarify this problem?

Hi, thanks for this issue. I think the implementation of Loss(AVS-AV) is similar as described in the paper. At first, some 'if' functions are used to get the masked visual feature, and the KL distance is computed. Could you please elaborate on what you think the quite difference is?

Thank you for your quick response! I am mainly confused by the average pooling operation and the location of summation in formula (3). To my understanding, it should be sum of KL(Mi · Zi, Ai). Please correct me if I'm wrong!

Sorry for this mistake and you are right. Thank you very much for pointing out this.

The average pooling operation is conducted on the (Mi * Zi), and the summation should be the sum of the KL distances of different stages. Therefore, equation (3) in the paper should be: L_{AVM} = \sum_{i=1}^{n} (KL(avg(Mi*Zi), Ai)).
This is correct and will be the same as the code implementation.

I have labeled this question and we will update a new version of the arxiv paper. Hope this can clear up your confusion and thanks again for this question.

This makes much more sense. Thank you for the clarification!