Some mismatches between the code and the paper and some error in the code / paper.

Hello,

Really thank you for releasing the code. However, I find several obvious mismatches between the code and the paper, and I also find some error.

MSDN/core/MSDN.py

Line 316 in ec3598b

A = F.softmax(A,dim = -1) # compute an attention map for each attribute

the softmax operation is along dim=-1, which represents the size of visual feature set, not like Eq. 1 in the paper
MSDN/core/MSDN.py

Line 376 in ec3598b

A = F.softmax(A,dim = 1) # compute an attention map for each attribute

the softmax operation is along dim=1, which represents the number of attributes , not like Eq. 4 in the paper

MSDN/core/MSDN.py

Line 380 in ec3598b

    
           S_p = torch.einsum('bir,bri->bi',A,S)       # compute attribute scores from attribute attention maps

and

MSDN/core/MSDN.py

Line 393 in ec3598b

    
           S_pp = torch.einsum('ki,bi->bik',self.att,S_p)       # compute the final prediction as the product of semantic scores, attribute scores, and attention over attribute scores

are totally different to Eq. 6 and subsequent PHI(x_i).

MSDN/core/MSDN.py

Line 202 in ec3598b

if not self.is_conservative:

the self.is_conservative is set True, which means the denominator of the first term in Eq. 7 is summed on C not C^s
MSDN/core/MSDN.py

Line 177 in ec3598b

def compute_loss_Self_Calibrate(self,in_package):

The sum on C^u of the second term in Eq. 7 should be placed after the symbol log
MSDN/core/MSDN.py

Line 237 in ec3598b

def compute_contrastive_loss(self, in_package1, in_package2):

line 238-242 do not calculate the l2 distance in the right way. Line 245-249 change the predictions to probabilities by F.softmax(), and the JSD eqution in the paper is wrong.

Could you please give some explanation about these mismatches, and correct the wrong code and equations?

Thanks.