dragonlzm/ISAL

Why ISAL picks images with less boxes

Closed this issue · 5 comments

As the experiment results showed, ISAL decreases the annotation cost by picking images with less boxes compared to the random baseline. However, the influence calculation formula has no direct relationship with the boxes number of each image. I didn't find any explanation in the paper about why it tends to pick those images with less boxes. In other words, if the objective is decreasing the annotation cost, why would you design the influence formula in this form? Thanks a lot.

In paper section 4.5. Visualization Analysis, "Our proposed method selects images with fewer bboxes, while the bboxes’ size in the selected images is significantly larger than the one selected by other methods. In addition, the bboxes in the selected images of our proposed method have a lower overlap ratio. This indicates that the clear and large object in the image helps the model learn more effectively"

Another problem is that, ISAL outperforms other methods in term of 10k x AP / bbox num. However, the comparison is based on the same number of images. I think it's more reasonable to make a comparison based on the same number of bboxes. For instance, let ISAL and random pick 100k bboxes(maybe ISAL picks 20k images and random picks 14k images). It's obvious that 10k x AP / bbox num is going down when images/bboxes number increases, so I wonder whether ISAL is still better in this setting especially when the bboxes num is relatively high(>30% of the whole dataset's bboxes)?

Hi! So, for the first point, we used the same number of images because in real life it's not so reasonable to just annotate part of the objects on the image and then come back later to annotate the rest of the objects on the same image. Such a back and forth procedure is not so efficient. Although we can reach a good performance in the scenario you just mention, just like "randomly selecting some boxes from one image for annotation", it may not be efficient in the real life.
For the second point, if I understand your question correctly, the setting of active learning is to conduct the experiment in a close set. Since in each dataset, the number of images that could provide the most positive influence on the trained model is limited. When the number of bbox is going high, the image which could provide the most positive influence on the trained model have been selected. The rest of them could not provide a positive influence on the model. So, when you continue selecting the images, the gap between our method and the random baseline will definitely become smaller.

Thanks for your patient reply, but I think you misunderstood my first question. My suggested experiment setting is based on "picking images with same bboxes"(i.e. random picks 14k images with 100k bboxes AGAINST ISAL picks 20k images with 100k bboxes) but not "randomly selecting some boxes from one image for annotation"(i.e. random picks 14k images with 100k bboxes and ignore 30k bboxes AGAINST ISAL picks 14k images with 70k bboxes). The advantage of this experiment setting is that you can directly compare the 'mAP' instead of '10k x AP / bbox num', which is more acceptable for most of us. But don't worry about it, I made some change to your code and reproduced this experiment, ISAL still got higher mAP in most of the experiments.

I still have another question, which may be more IMPORTANT. I found that there is a "--gt" setting in test_calc_infu.py, which is false by default. I believe that all the experiments in the paper is using this default setting. However, I think all the math in the paper assumes that we are using ground truth labels to calculate influence. Therefore, I set the 'args.gt' to 'true' and made experiments, the results show that ISAL picks images with remarkable more bboxes than the random baseline. My personal understanding is that images with more bboxes lead to higher loss and is more likely to be picked, which is inconsistent with the theory and conclusion 'ISAL picks images with less bboxes' in the paper.

I didn't find a reasonable theoretical explanation about why ISAL only picks images with less bboxes when we are using the pseudo label produced by the model itself to calculate influence. Did you conduct similar experiments and how would you explain that? Thanks in advance. (FYI, my experiments is based on Faster-rcnn on CoCo)

Your understanding is correct, the image with more GT bboxes might be selected since their loss is larger. But actually, the influence selection is not so relative to the size of the loss. In fact, its idea is more relative to calculating the similarity between the loss on the validation set and the loss of each image. The idea is to select images that can provide the largest needed update to the model. ISAL selecting the image with fewer bboxes is just the phenomenon used to explain why our method can perform better than other methods.