I wonder if the way of InfoNCE I used was wrong( ´•̥̥̥ω•̥̥̥` )
evelynlee999 opened this issue · 7 comments
Hi,
I try to optimize the code by using InfoNCE Loss and AAM loss, and code about InfoNCE as followed which is based your code:
def contrastive(self, embeddings_z: t.Tensor, embeddings: t.Tensor, logits: t.Tensor):
logits1 = logits
high = embeddings.shape[0]
idx = random.randint(0,int(high)-1)
query = embeddings[idx:idx+1]
positive_key = embeddings_z[idx:idx+1]
# negative_keys = embeddings_z
negative_keys = t.cat((embeddings_z[:idx],embeddings_z[idx+1:]))
query = F.normalize(query, dim=-1)
positive_key = F.normalize(positive_key, dim=-1)
negative_keys = F.normalize(negative_keys, dim=-1)
# Cosine between positive pairs
positive_logit = t.sum(query * positive_key, dim=1, keepdim=True)
negative_logits = query @ self.transpose(negative_keys)
logits = t.cat([positive_logit, negative_logits], dim=1)
labels = t.zeros(len(logits), dtype=t.long, device=query.device)
loss = F.cross_entropy(logits / self.temperature, labels, reduction=self.reduction)
with t.no_grad():
# put predictions into [0, 1] range for later calculation of accuracy
prediction = F.softmax(logits1, dim=1).detach()
return loss,prediction
Joint AAM and InfoNce as followed:
self.c_contrastive = nn.Parameter(torch.rand(1))
loss = self.c_aam * aam_loss + self.c_contrastive * contrastive_loss
The smaller loss result, the better performance. But when I ran the code, the c_contrastive always became negative, which was mean the bigger loss result the better performance. so I wonder if the code of InfoNCE I used was wrong.
I was trapped in this for a long time. Soooo looking forward to your reply: )
Are you sure that you want self.c_contrastive to be a Parameter? It will also get the gradient of the loss, unless you set requires_grad=False in the constructor. Minimizing the loss will minimize self.c_contrastive, causing it to become negative.
Yeah, I want it be a weight of InfoNce and can participate in training to get the best result. I'll give it try. Thanks a lot.
For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way.
这个想法听着可行,您可以试试看看效果。我感觉损失的设计还是要紧密结合任务,看infonce loss在您的任务中具体起什么作用。比如我是少样本的分类问题,希望通过对比学习更丰富的Latent representation,我的损失就是两个infonce和一个分类的交叉熵: total loss = ce_loss+αinfonce1+βinfonce2, α和β都是超参,调就可以了。我感觉多个损失联合优化时权重超参的设置对模型表现影响挺大的。