I wonder if the way of InfoNCE I used was wrong( ´•̥̥̥ω•̥̥̥` )

Question

I wonder if the way of InfoNCE I used was wrong( ´•̥̥̥ω•̥̥̥` )

evelynlee999 opened this issue 2 years ago · 7 comments

Hi,

I try to optimize the code by using InfoNCE Loss and AAM loss, and code about InfoNCE as followed which is based your code:

def contrastive(self, embeddings_z: t.Tensor, embeddings: t.Tensor, logits: t.Tensor):
        logits1 = logits
        high = embeddings.shape[0]
        idx = random.randint(0,int(high)-1)
        
        query = embeddings[idx:idx+1]
        positive_key = embeddings_z[idx:idx+1]
        # negative_keys = embeddings_z
        negative_keys = t.cat((embeddings_z[:idx],embeddings_z[idx+1:]))
        query = F.normalize(query, dim=-1)
        positive_key = F.normalize(positive_key, dim=-1)
        negative_keys = F.normalize(negative_keys, dim=-1)
        

        
        # Cosine between positive pairs
        positive_logit = t.sum(query * positive_key, dim=1, keepdim=True)
        
        negative_logits = query @ self.transpose(negative_keys)
        logits = t.cat([positive_logit, negative_logits], dim=1)
        labels = t.zeros(len(logits), dtype=t.long, device=query.device)
        
        loss = F.cross_entropy(logits / self.temperature, labels, reduction=self.reduction)
        
        with t.no_grad():
            # put predictions into [0, 1] range for later calculation of accuracy
            prediction = F.softmax(logits1, dim=1).detach()
        
        return loss,prediction

Joint AAM and InfoNce as followed:

self.c_contrastive = nn.Parameter(torch.rand(1))
loss = self.c_aam * aam_loss + self.c_contrastive * contrastive_loss

The smaller loss result, the better performance. But when I ran the code, the c_contrastive always became negative, which was mean the bigger loss result the better performance. so I wonder if the code of InfoNCE I used was wrong.

I was trapped in this for a long time. Soooo looking forward to your reply: )

Answer 1 · 2023-05-11T11:30:19.000Z

Are you sure that you want self.c_contrastive to be a Parameter? It will also get the gradient of the loss, unless you set requires_grad=False in the constructor. Minimizing the loss will minimize self.c_contrastive, causing it to become negative.

Answer 2 · 2023-05-11T11:47:33.000Z

Yeah, I want it be a weight of InfoNce and can participate in training to get the best result. I'll give it try. Thanks a lot.

Answer 3 · 2023-06-14T13:03:57.000Z

For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way.

Answer 4 · 2023-06-14T13:21:20.000Z

Yeah, I’ve found some code use it as hyper parameter too. Thank u sooo much🥰发自我的 iPhone在 2023年6月14日，21:04，Yuntian Wang ***@***.***> 写道： For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 5 · 2023-06-14T13:31:04.000Z

大佬看你名字好像**人，呜呜呜，所以那块儿设置成超参，你说我后面那个用的infonce loss哦，能把这个当正则项吗，然后那个不当权重，当正则项系数，我这几天看代码啥的，好像正则项系数也是超参发自我的 iPhone在 2023年6月14日，21:04，Yuntian Wang ***@***.***> 写道： For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 6 · 2023-06-14T13:58:17.000Z

这个想法听着可行，您可以试试看看效果。我感觉损失的设计还是要紧密结合任务，看infonce loss在您的任务中具体起什么作用。比如我是少样本的分类问题，希望通过对比学习更丰富的Latent representation，我的损失就是两个infonce和一个分类的交叉熵: total loss = ce_loss+αinfonce1+βinfonce2， α和β都是超参，调就可以了。我感觉多个损失联合优化时权重超参的设置对模型表现影响挺大的。

Answer 7 · 2023-06-14T14:01:38.000Z

好的好的，谢谢！发自我的 iPhone在 2023年6月14日，21:58，Yuntian Wang ***@***.***> 写道：这个想法听着可行，您可以试试看看效果。我感觉损失的设计还是要紧密结合任务，看infonce loss在您的任务中具体起什么作用。比如我是少样本的分类问题，希望通过对比学习更丰富的Latent representation，我的损失就是两个infonce和一个分类的交叉熵: total loss = ce_loss+αinfonce1+βinfonce2， α和β都是超参，调就可以了。我感觉多个损失联合优化时权重超参的设置对模型表现影响挺大的。 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>