Question to the image classification score for multi-label object detection

Question

Question to the image classification score for multi-label object detection

Closed this issue 2 years ago · 7 comments

Hello,

I am working on an object detection problem where I have to detect N classes where each bbox may be one or more of those N labels. The image classification score computed in your paper for uncertainty re-weighting uses softmax, assuming that only one class can be present within one instance.

Could you give me some guidance how this formula can be generalized to multi-label case?

Answer 1 · 2022-05-31T11:51:38.000Z

Hello,

I think you can change the softmax function to the sigmoid function for multi-label case, considering that multi-label case can be treated as many binary classication problems on each class.

Answer 2 · 2022-06-02T09:02:19.000Z

@yuantn right, so if you were to bring it to the code, would it look something like this?

y_head_f_1 = self.f_1_retina(f_1_feat)
y_head_f_2 = self.f_2_retina(f_2_feat)
y_head_f_r = self.f_r_retina(f_r_feat)
y_head_f_mil = self.f_mil_retina(f_mil_feat)
y_head_cls_term2 = (y_head_f_1 + y_head_f_2) / 2
y_head_cls_term2 = y_head_cls_term2.detach()

y_head_f_mil = y_head_f_mil.permute(0, 2, 3, 1).reshape(y_head_f_1.shape[0], -1, self.cls_out_channels)
y_head_cls_term2 = y_head_cls_term2.permute(0, 2, 3, 1).reshape(y_head_f_1.shape[0], -1, self.cls_out_channels)
# computing mil scores for multi-class case (how it is right now)
# y_head_cls = y_head_f_mil.softmax(2) * y_head_cls_term2.sigmoid().max(2, keepdim=True)[0].softmax(1)

# computing mil scores for multi-label case
y_head_cls = y_head_f_mil.sigmoid(2) * y_head_cls_term2.sigmoid().max(2, keepdim=True)[0]

So now y_head_cls outputs scores for C binary classifiers (i.e classes)
And the ground truth mil scores for an anchor would be a binary vector instead of one-hot encoded one?

Feeling a bit hesitant about the impact of the second term. Could you, please, clarify if it is needed for my case?

Answer 3 · 2022-06-02T13:09:31.000Z

The code seems to be OK.

But the ground truth mil scores is both a binary vector and an one-hot encoded one, such as [1,0,1,0,0] for an anchor with class 0 and class 2.

If you are still hesitant, you can debug for each component for the y_head_cls. Specifically, you can print the shape of y_head_f_mil, y_head_f_mil(2), y_head_cls_term2, y_head_cls_term2.sigmoid(), y_head_cls_term2.sigmoid().max(2, keepdim=True) and y_head_cls_term2.sigmoid().max(2, keepdim=True)[0] to confirm which one you need.

Answer 4 · 2022-06-02T14:24:12.000Z

@yuantn got it, one last question ;)

You demonstrate results on PASCAL VOC dataset doing 7 AL cycles each time adding 5 % of examples from unlabelled set to the labelled one. Do I understand correctly that when applying this active learning pipeline to a real world project, we would need only one cycle? If so, how does one determine the amount of data to be used for Label Set Training and the rest for Re-weighting and Max/Min Instance Uncertanty?

Answer 5 · 2022-06-03T09:29:37.000Z

@yuantn I also noticed that in the L_wave_max function you do not calculate image classification loss for the labelled set, although you have it in the equation 8 in the paper. Is it an intentional change?

I mean this place in the code.

Answer 6 · 2022-06-03T10:52:52.000Z

@yuantn got it, one last question ;)

You demonstrate results on PASCAL VOC dataset doing 7 AL cycles each time adding 5 % of examples from unlabelled set to the labelled one. Do I understand correctly that when applying this active learning pipeline to a real world project, we would need only one cycle? If so, how does one determine the amount of data to be used for Label Set Training and the rest for Re-weighting and Max/Min Instance Uncertanty?

I am afraid not.

As mentioned in our paper: The key idea of active learning is that a machine learning algorithm can achieve better performance with fewer training samples if it is allowed to select which to learn.

So if you unfoundedly select 20% of the samples at one time as the labeled samples, the quality of them will not be as high as those of the 20% samples that are gradually selected through multiple cycles.

The latter samples can cover more uncertainty aspects, can be more diverse, and can be more representative, instead of the former randomly selected samples.

However, if it is trained with too many cycles during the active learning, a longer and more useless early training process will be introduced. So there needs a trade-off between the number of cycles and the sample quality.

According to the existing experimental results for multiple tasks on CIFAR, ImageNet, PASCAL VOC, MS COCO, BDD100K, CityScapes and other datasets, 5~7 active learning cycles are usually selected. To avoid cold-start problems, the size of the initial labeled set should also not be too small.

In each cycle, the number of unlabeled samples used for the three steps should be the same as the number of labeled samples, which is randomly selected from the set of all unlabeled samples.

Answer 7 · 2022-06-03T11:06:39.000Z

@yuantn I also noticed that in the L_wave_max function you do not calculate image classification loss for the labelled set, although you have it in the equation 8 in the paper. Is it an intentional change?

I mean this place in the code.

It is true that the paper and the code are not unified, and the code should be the correct version.