Question about the codebase of detector (Detectron or tf-faster-rcnn)

Hi @gaochen315! Many thanks for the great work!

I have some questions about the detector in your model.

I noticed that in the testing process, the human-object pair proposals are given in Data/Test_Faster_RCNN_R-50-PFN_2x_HICO_DET.pkl, are these proposals come from the outputs of Detectron or other codebases?
In the training process, you used a pretrained ResNet50-FPN from tf-faster-rcnn as the detector, why not just use the detection results in Data/Test_Faster_RCNN_R-50-PFN_2x_HICO_DET.pkl?

Thank you very much for your attention.😊

Yes. I ran the official Caffe2 implementation of Detectron on HICO-DET dataset. The filename gives you the cue about the parameters, i.e. R-50-PFN_2x.
As for training, I didn't use anything from tf-faster-rcnn. Could you please specify in which line of my training code you found tf-faster-rcnn is involved? I will take a look at it. Thanks!

Yes. I ran the official Caffe2 implementation of Detectron on HICO-DET dataset. The filename gives you the cue about the parameters, i.e. R-50-PFN_2x.

As for training, I didn't use anything from tf-faster-rcnn. Could you please specify in which line of my training code you found tf-faster-rcnn is involved? I will take a look at it. Thanks!

In lib/networks/iCAN_ResNet50_HICO.py, the implementation of backbone is similar to the one in tf-faster-rcnn, but I'm not sure whether they're exactly the same.

iCAN/lib/networks/iCAN_ResNet50_HICO.py

Lines 289 to 293 in 752ce5b

    
           # ResNet Backbone 
        
           head       = self.image_to_head(is_training) 
        
           sp         = self.sp_to_head() 
        
           pool5_H    = self.crop_pool_layer(head, self.H_boxes, 'Crop_H') 
        
           pool5_O    = self.crop_pool_layer(head, self.O_boxes[:self.H_num,:], 'Crop_O')

Additionally, it seems that the pretrained weights of Faster R-CNN Weights/res50_faster_rcnn_iter_1190000.ckpt were loaded before training, so during the training process, the model gets detection results from itself instead of Detectron, which is different from the way in the testing process.

@gaochen315 Sorry for my mistake, I've read your code carefully, and it seems that you've used a refined ResNet-50 (only stage 1 to stage 4) for feature extraction before the three streams. So I wonder whether the feature extraction network has been pre-trained on any datasets or it can be trained end-to-end during the training process of the whole model?

Thank you for your attention!

The feature extraction network is initialized from tf-faster-rcnn's model (trained on COCO). It is not trained from scratch.

The feature extraction network is initialized from tf-faster-rcnn's model (trained on COCO). It is not trained from scratch.

Thank you!

	# ResNet Backbone
	head = self.image_to_head(is_training)
	sp = self.sp_to_head()
	pool5_H = self.crop_pool_layer(head, self.H_boxes, 'Crop_H')
	pool5_O = self.crop_pool_layer(head, self.O_boxes[:self.H_num,:], 'Crop_O')