salesforce/PCL

About Loss of InfoNCE and Cluter_results

lsyysl9711 opened this issue · 2 comments

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(

    labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()
    )
    I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?

  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
    (

    PCL/main_pcl.py

    Line 299 in 964da1f

    cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())
    )
    So what is the motivation behind this operation, I think we should run it on training set.

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(
    labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()

    )
    I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?
  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
    (

    PCL/main_pcl.py

    Line 299 in 964da1f

    cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())

    )
    So what is the motivation behind this operation, I think we should run it on training set.

Same question.

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(
    labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()

    )
    I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?
  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration:
    (

    PCL/main_pcl.py

    Line 299 in 964da1f

    cluster_result['im2cluster'].append(torch.zeros(len(eval_dataset),dtype=torch.long).cuda())

    )
    So what is the motivation behind this operation, I think we should run it on training set.

For the first question, you can refer to the moco_v1 code, where they use cross-entropy directly for InfoNCE. As for the second question, they use the eval_dataset as negative prototypes, and in the line of code:

output, target, output_proto, target_proto = model(im_q = images[0], im_k = images[1],
                                                   cluster_result = cluster_result, index = index)

the passed index is from the train_loader, so it still computes based on the train_dataset.