CrimyTheBold/tripletloss

Probs

Closed this issue · 2 comments

Hi,

Many thanks for sharing this code!
I've just started working on problem related to triplet loss and your repository is very useful to my project. I was just a bit confused, why -compute_dist becomes the estimation of the probability? Do you just measure the differences between two embedding using compute_dist function?

#compute the probability of being the right decision : it should be 1 for right class, 0 for all other classes
probs[k] = -compute_dist(embeddings[i,:],embeddings[j,:])

Hi,
Here's the main idea : to evaluate how good our solution is working, I decided to treat it as if it was a binary model. If i give 2 pictures, it should output if they are the same : yes or no, 1 or 0. In this evaluation effort, we use the distance produced by our model. Here's the intuition:

  • If the distance is high => different picture (then label 0), if the distance is low => same picture (then label 1).

  • if you take the opposite distance (with a minus sign), you have (-highdistance) => label 0 and (-low distance)=>label1.

  • If the distance increase, (-distance) decrease towards 0. If the distance decrease (-distance) increase towards... something. Let's call that something a 1. Or 100%

Theses (inverted) distances can play the role of the "probability of being the same picture". It's not stricly that probability, but it's related to it. It doesnt matter if the computed distance does not go from 0% to 100% exactly because the next step is to use the ROC Curve to help us decide the value of the boundary (ie to decide if a distance value means a "not same" or "same" decision)
The comment "it should be 1..." is for the Y (the true labels)
It that more clear ?

Many thanks! I got what you mean. So, it's not probability, but we can use it as a representation of probability to determine the threshold in ROC function.

I found out that the L2 distance between two embedding is always between 0 and 1 as you have L2 normalized the output embedding in the network (the length of the vector is 1). So, instead of using -distance, I think (1-distance) can be used if we want to be 'closer' to represent probability. E.g. instead of having -0.7 to represent the value between two different pictures, we get 1-0.7=0.3 (low) probability that the two pictures are similar.