Why do you use cross entropy ?

Question

Why do you use cross entropy ?

Closed this issue a year ago · 4 comments

Is it same as the following INFo-NCE loss?

Answer 1 · 2023-03-08T09:49:37.000Z

Yes, its the same (see https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html). The cross entropy function is usually used for classification problems, where you want your model to predict the correct class out of K classes for each sample.
The InfoNCE loss can also be interpreted as a kind of classification, where you have K-1 negative classes and 1 positive class.

Answer 2 · 2023-03-09T06:04:36.000Z

Ok got it.

Just one more thing, I guess this loss is the same as the NT-Xet loss or MultipleNegative Ranking loss in Sentece Transformers right?

Answer 3 · 2023-03-11T10:53:51.000Z

I'm not familiar with MultipleNegativesRankingLoss, but it looks the same. I often see the NT-Xent loss and Info-NCE being used interchangeably, but I believe technically the difference is that NT-Xent excludes the positive sample from the denominator whereas Info-NCE includes it. In the papers there is some variation in how they get the embeddings and which (combination of) samples they use for the loss, but functionally it essentially comes down to doing softmax on embeddings.

Answer 4 · 2023-03-12T08:24:46.000Z

Thanks a lot. In one tutorial I’ve seen the difference in NT-dent is we use softmax scores and a temperature. https://youtu.be/iqzJybIk4Go?t=1408