trzy/FasterRCNN

What happen when I use vgg16 as backbone.

Opened this issue · 3 comments

@trzy
Thank you your good work.
I've affirmed your pytorch version codes. Then, I have quessions.

  1. What happen when I use vgg16 (not models/vgg16_torch.py) as backbone? At that case, how to load the initial weight? I did not define vgg16_caffe.pth, but I can train.
  2. If I want to make model which can use four channels, should I make new image classification programs to train backbone from scratch? (I will use vgg16, but I can follow your advise if I have to use others.)

I want to use your good programs for my task. So, Please let me know.
Regards.

Addictional
Can I train backbone from scratch with https://github.com/trzy/VGG16.git ?

1:

vgg16 vs. vgg16_torch: a bit confusing but the first one is my "hand-coded" version. I define the VGG-16 network from scratch using layers. The second one uses torchvision's built-in VGG-16 model, torchvision.models.vgg16, which is exactly the same except that the layers are named differently.

If you use vgg16_torch, the weights should be downloaded automatically because torchvision provides them. If you use my vgg16 implementation, you'll need vgg16_caffe.pth, which my download script will fetch for you.

torchvision already contains many common CNNs, including Faster R-CNN. Obviously, I wanted to implemented Faster R-CNN from scratch, so I did all the work myself, including the VGG-16 backbone initially, but then I included the option to use the torchvision version just to demonstrate how to do so. I then later went ahead and added torchvision's ResNet backbones.

To download the vgg16_caffe.pth weights file, just look at download_models.sh.

2:

I'm still not sure why you want to pass 4 channel images in. What is inside this 4th channel?

No.1 I see.
For No.2
In deep learning research field, engineers take into RGB channels for building model. But there are several 2D data. For example, Normalize Differential Vegetation Index (NDVI; it indicate the degree of vegetation healthy), Digital Eevation Model (DEM; it shows the topography shape). These 2D data is able to get by artificial satellite. Previous studies shows that building model with 4th channel can increase more infomations. Therefore, 4th channel increase accuracy of object detection.
Related work is Mask R-CNN (https://github.com/orestis-z/mask-rcnn-rgbd). But I don't need Instance Segmentation. I want to Object Detection. So I want to Faster R-CNN which can input 4th channel.