fartashf/vsepp

The question about loss function

forence opened this issue · 1 comments

I studied your paper and codes. As I understand it, one caption-image pair is Postive sample, and the other (mini-batch size - 1) caption-image pairs are Negative sample. However, if you sample some captions which happens to belong to the same image in one mini-batch, and these pairs are considered to be Negative as your code. In fact, they should be positive samples. Does this affect the hard sample miner for the contrastive loss?
Looking forward to your reply!

Please see issue #6 for an old explanation about this. Let me know if you still have questions.