About Loss of InfoNCE and Cluter_results

Hi,

I notice that the labels created in InfoNCE loss is always a zero-vector:(

PCL/pcl/builder.py

Line 163 in 964da1f

labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()

)
I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?
In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
(

PCL/main_pcl.py

Line 299 in 964da1f

cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())

)
So what is the motivation behind this operation, I think we should run it on training set.

Hi,

I notice that the labels created in InfoNCE loss is always a zero-vector:(

PCL/pcl/builder.py

Line 163 in 964da1f

labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()

)
I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?

In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
(

PCL/main_pcl.py

Line 299 in 964da1f

cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())

)
So what is the motivation behind this operation, I think we should run it on training set.

Same question.

Hi,

I notice that the labels created in InfoNCE loss is always a zero-vector:(

PCL/pcl/builder.py

Line 163 in 964da1f

labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()

)
I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?

In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
(

PCL/main_pcl.py

Line 299 in 964da1f

cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())

)
So what is the motivation behind this operation, I think we should run it on training set.

For the first question, you can refer to the moco_v1 code, where they use cross-entropy directly for InfoNCE. As for the second question, they use the eval_dataset as negative prototypes, and in the line of code:

output, target, output_proto, target_proto = model(im_q = images[0], im_k = images[1],
                                                   cluster_result = cluster_result, index = index)

the passed index is from the train_loader, so it still computes based on the train_dataset.