google-deepmind/detcon

loss on the same object in different images

hongsukchoi opened this issue · 1 comments

I read the paper and have a question. I am new to the JAX and cannot figure it out in this repo.

In the paper, the contrastive learning pulls together pooled feature vectors from the same mask (across views) and pushes apart features from different masks and different images.

Then how about the loss on the same object in different images?
For example, there can be different elephants in different images. Do you pull them together or push them apart?

Since the method is purely unsupervised, we don't know whether objects from other images belong to the same class (e.g. elephants from other images) so we treat them all as negatives.