Questions about data preprocessing
Closed this issue · 2 comments
Thank you for your great work.
While reading the code, I find something confused me in data preprocessing:
# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
if obj['area'] < 32 * 32:
output[0][self.cat2cat[obj['category_id']]] = 1
elif obj['area'] < 96 * 96:
output[1][self.cat2cat[obj['category_id']]] = 1
else:
output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]
Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0
will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.
It would be better if the authors replace the code with a simple one and reproduce the results.
Hi, thanks for the interest in our paper!
For the data preprocessing of coco, I follow the exact setting of ASL paper implementation. Please check https://github.com/Alibaba-MIIL/ASL/blob/8c9e0bd8d5d450cf19093363fc08aa7244ad4408/src/helper_functions/helper_functions.py#L118
Hi note that in line 93, the mask
is concatenated three times. So the same positions are masked out for all three parts. As a result, 50% of labels will be retained if setting the proportion to be 0.5 by following the README.
DualCoOp/dataloaders/coco_detection.py
Line 93 in 49e71d7
And you can also empirically verify the correctness of our implementation by running the following piece of code in coco_detection.py
if __name__ == '__main__':
root_path = '/PATH/TO/MSCOCO/'
data_split = 'train2014'
for pp in range(1,10):
dataset = CocoDetection(root_path, data_split, label_mask=None, partial=pp/10.)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=10,
shuffle=False,
num_workers=4, pin_memory=True)
total_label = 0.
masked_label = 0.
for i, (images, target) in enumerate(train_loader):
target = target.max(dim=1)[0]
total_label += target.shape[1]*target.shape[0]
masked_label += (target==-1).sum()
print("Given pp={}, the actual masked portion is {}".format(pp/10.,masked_label.item()/total_label))