Questions about data preprocessing

Thank you for your great work.

While reading the code, I find something confused me in data preprocessing:

# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
    if obj['area'] < 32 * 32:
        output[0][self.cat2cat[obj['category_id']]] = 1
    elif obj['area'] < 96 * 96:
        output[1][self.cat2cat[obj['category_id']]] = 1
    else:
        output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
    masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
    target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]

Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0 will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.

It would be better if the authors replace the code with a simple one and reproduce the results.

Hi, thanks for the interest in our paper!

For the data preprocessing of coco, I follow the exact setting of ASL paper implementation. Please check https://github.com/Alibaba-MIIL/ASL/blob/8c9e0bd8d5d450cf19093363fc08aa7244ad4408/src/helper_functions/helper_functions.py#L118

Hi note that in line 93, the mask is concatenated three times. So the same positions are masked out for all three parts. As a result, 50% of labels will be retained if setting the proportion to be 0.5 by following the README.

DualCoOp/dataloaders/coco_detection.py

Line 93 in 49e71d7

mask = torch.stack([mask, mask, mask], dim=1)

And you can also empirically verify the correctness of our implementation by running the following piece of code in coco_detection.py

if __name__ == '__main__':

  root_path = '/PATH/TO/MSCOCO/'
  data_split = 'train2014'

  for pp in range(1,10):
    dataset = CocoDetection(root_path,  data_split, label_mask=None, partial=pp/10.)
    train_loader = torch.utils.data.DataLoader(dataset, batch_size=10,
                                             shuffle=False,
                                             num_workers=4, pin_memory=True)
    total_label = 0.
    masked_label = 0.

    for i,  (images, target) in enumerate(train_loader):
      target = target.max(dim=1)[0]
      total_label += target.shape[1]*target.shape[0] 
      masked_label +=  (target==-1).sum()

    print("Given pp={}, the actual masked portion is {}".format(pp/10.,masked_label.item()/total_label))