Using MobileNet instead of VGG16

Question

Using MobileNet instead of VGG16

Closed this issue 4 years ago · 5 comments

Dear Mihai, first of all thanks for your great work. I am currently trying to get D2 faster, and I read your suggestion to use MobileNet in a different github issue. My application needs a certain precision in feature position. The current trunkated VGG16 system scales down the input image resolution by ~factor 4 (e.g.: input 640x640 => dense_features = 159x159), which is the minimum I can afford. Translating the spatial downscaling to MobileNetV2, I would have to truncate its structure pretty early, after layer 3 (the 2nd bottleneck layer). Do you have any insight if that would make any sense? Or could it be that using MobileNet(V2) is not a good candidate for a feature-precise D2-version, due to its early drastic spatial reduction?

Answer 1 · 2019-12-12T09:32:50.000Z

Dear Rolf. It is true that the off-the-shelf version of MobileNetV2 might not be directly adequate for the D2 approach: as for ResNets, you would need to go to ~1/16th of input resolution to get a similar performance to the VGG counterpart. However, there are a few things that you could try:

As we did for VGG at test time, modifying the network to use dilated convolutions and no stride for the last block before the cut. This would get you to 1/8th of the resolution without any need for re-training.
To further improve the resolution, MobileNetV2 can be modified to obtain a lower stride. One could do this by either moving the strided convolutions at the end of each block (instead of the beginning) or by simply removing the stride of the first block. However, this would require re-training on ImageNet since the receptive fields of all layers would be completely different.

Answer 2 · 2019-12-12T10:09:36.000Z

Hello Mihai, thanks for the fast answer and the suggestions. I will try #1, and also test convolutional transpose. I'm shying back a bit from fully retraining MobileNet.
Have a nice day!

Answer 3 · 2019-12-20T08:17:19.000Z

Hi, @mihaidusmanu ,in the suplementary material it set K=4 for mining harder samples because the receptive field of conv_3 is 65x65 for VGG.However, doesn't it is 92x92 for conv4_3? so the K may be set 8?

Answer 4 · 2020-02-27T09:08:05.000Z

@Lakaemper Hi Lakaemper, I am working about trying to get D2 faster too. So I am wondering about werther it works with MobileNetV2. Thank you!

Answer 5 · 2020-05-04T13:24:06.000Z

I will close this issue since there are no recent updates. Feel free to re-open a new one if you run into any issues with the provided code.