shiming-chen/MSDN

Some mismatches between the code and the paper and some error in the code / paper.

bad-meets-joke opened this issue · 0 comments

Hello,

Really thank you for releasing the code. However, I find several obvious mismatches between the code and the paper, and I also find some error.

  • A = F.softmax(A,dim = -1) # compute an attention map for each attribute
    the softmax operation is along dim=-1, which represents the size of visual feature set, not like Eq. 1 in the paper
  • A = F.softmax(A,dim = 1) # compute an attention map for each attribute
    the softmax operation is along dim=1, which represents the number of attributes , not like Eq. 4 in the paper
  • S_p = torch.einsum('bir,bri->bi',A,S) # compute attribute scores from attribute attention maps
    and
    S_pp = torch.einsum('ki,bi->bik',self.att,S_p) # compute the final prediction as the product of semantic scores, attribute scores, and attention over attribute scores
    are totally different to Eq. 6 and subsequent PHI(x_i).
  • if not self.is_conservative:
    the self.is_conservative is set True, which means the denominator of the first term in Eq. 7 is summed on C not C^s
  • def compute_loss_Self_Calibrate(self,in_package):
    The sum on C^u of the second term in Eq. 7 should be placed after the symbol log
  • def compute_contrastive_loss(self, in_package1, in_package2):
    line 238-242 do not calculate the l2 distance in the right way. Line 245-249 change the predictions to probabilities by F.softmax(), and the JSD eqution in the paper is wrong.

Could you please give some explanation about these mismatches, and correct the wrong code and equations?

Thanks.