vt-vl-lab/iCAN

Question about the codebase of detector (Detectron or tf-faster-rcnn)

yeliudev opened this issue · 5 comments

Hi @gaochen315! Many thanks for the great work!

I have some questions about the detector in your model.

  1. I noticed that in the testing process, the human-object pair proposals are given in Data/Test_Faster_RCNN_R-50-PFN_2x_HICO_DET.pkl, are these proposals come from the outputs of Detectron or other codebases?

  2. In the training process, you used a pretrained ResNet50-FPN from tf-faster-rcnn as the detector, why not just use the detection results in Data/Test_Faster_RCNN_R-50-PFN_2x_HICO_DET.pkl?

Thank you very much for your attention.😊

  1. Yes. I ran the official Caffe2 implementation of Detectron on HICO-DET dataset. The filename gives you the cue about the parameters, i.e. R-50-PFN_2x.

  2. As for training, I didn't use anything from tf-faster-rcnn. Could you please specify in which line of my training code you found tf-faster-rcnn is involved? I will take a look at it. Thanks!

  1. Yes. I ran the official Caffe2 implementation of Detectron on HICO-DET dataset. The filename gives you the cue about the parameters, i.e. R-50-PFN_2x.
  2. As for training, I didn't use anything from tf-faster-rcnn. Could you please specify in which line of my training code you found tf-faster-rcnn is involved? I will take a look at it. Thanks!

In lib/networks/iCAN_ResNet50_HICO.py, the implementation of backbone is similar to the one in tf-faster-rcnn, but I'm not sure whether they're exactly the same.

# ResNet Backbone
head = self.image_to_head(is_training)
sp = self.sp_to_head()
pool5_H = self.crop_pool_layer(head, self.H_boxes, 'Crop_H')
pool5_O = self.crop_pool_layer(head, self.O_boxes[:self.H_num,:], 'Crop_O')

Additionally, it seems that the pretrained weights of Faster R-CNN Weights/res50_faster_rcnn_iter_1190000.ckpt were loaded before training, so during the training process, the model gets detection results from itself instead of Detectron, which is different from the way in the testing process.

@gaochen315 Sorry for my mistake, I've read your code carefully, and it seems that you've used a refined ResNet-50 (only stage 1 to stage 4) for feature extraction before the three streams. So I wonder whether the feature extraction network has been pre-trained on any datasets or it can be trained end-to-end during the training process of the whole model?

Thank you for your attention!

The feature extraction network is initialized from tf-faster-rcnn's model (trained on COCO). It is not trained from scratch.

The feature extraction network is initialized from tf-faster-rcnn's model (trained on COCO). It is not trained from scratch.

Thank you!