jay-mahadeokar/pynetbuilder

ssd+squeezenet

Opened this issue · 20 comments

Jay

Do you have a plan to generate a builder for ssd+squeezenet?
I am looking for a low computational complexity of SSD detector and am thinking ssd+squeezenet may be a good compromise between accuracy and speed.

Thanks,

@kaishijeng I believe the complexity of squeezenet in terms of flops is ~800 Million (though not sure, need to run it through complexity module) and the corresponding top1 accuracy on imagenet is ~58%, its advantage is lesser no of params (which affects memory and not speed). In comparison, thin resnet 50 (or resnet_50_1by2) which I trained has ~10k M flops with top1 accuracy of 66.79 on imagenet. See this comparison table. I had run experiment to train resnet_50_1by2 with SSD and got around 64-65% mAP on voc dataset, as compared to 70.4 using full resnet 50 described here. If you want even faster network (and not smaller in size), I suppose using tweaked resnet variants could be useful.
That said, it would be interesting to see how squeezenet can be used as base network for SSD (which layers /feature maps to use etc). There is a quick guide on how it can be done.

Jay,

Thanks for the info about squeezenet vs resnet50. My understanding is
squeezenet is faster than alexnet and also has smaller size of parameters.

Do you have speed comparison between ssd+vgg16 vs ssd+resnet50?
Can you share pretrained models of ssd+resnet_50 or ssd+resnet_50_1by2?
I will try to train ssd+resnet_50 this weekend.

Thanks,

On Mon, Aug 8, 2016 at 11:12 PM, Jay Mahadeokar notifications@github.com
wrote:

@kaishijeng https://github.com/kaishijeng I believe the complexity of
squeezenet in terms of flops is ~800 Million (though not sure, need to run
it through complexity module) and the corresponding top1 accuracy on
imagenet is ~58%, its advantage is lesser no of params (which affects
memory and not speed). In comparison, thin resnet 50 (or resnet_50_1by2)
which I trained has ~10k M flops with top1 accuracy of 66.79 on imagenet.
See this (comparison table)[https://github.com/jay-
mahadeokar/pynetbuilder/tree/master/models/imagenet#basic-
residual-network-results]. I had run experiment to train resnet_50_1by2
with SSD and got around 64% mAP on voc dataset, as compared to 70.4 using
full resnet 50 described here
https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd.
If you want even faster network (and not smaller in size), I suppose using
tweaked resnet variants could be useful.

That said, it would be interesting to see how squeezenet can be used as
base network for SSD (which layers /feature maps to use etc). This is a
quick guide on how it can be done: https://github.com/jay-
mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd#
building-other-detection-networks


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3lF5HJ3q6Ct0lSrDRbAHPHmXpP7Zks5qeBo2gaJpZM4Jfvjr
.

Please refer this table for ssd+vgg16 and ssd+resnet50. I have also shared the caffemodel.. This table also compares resnet 50 and resnet_50_1by2. Though I havent yet added model files object detection using for resnet_50_1by2 + ssd, it should be easy to train it (since I have added the model pre-trained on imagenet). Let me know if the training ssd+resnet doc is sufficient, or you run into any bugs.

According to your table, ssd+resnet50 shpould be 2 or 3 times faster than
ssd+vgg16.
Is this what you have observed?

Thanks,

On Tue, Aug 9, 2016 at 12:05 AM, Jay Mahadeokar notifications@github.com
wrote:

Please refer this table
https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd#comparing-vgg-and-resnet-50-ssd-based-detection-networks
for ssd+vgg16 and ssd+resnet50. I have also shared the caffemodel.. This
table
https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet#basic-residual-network-results
also compares resnet 50 and resnet_50_1by2. Though I havent yet added model
files object detection using for resnet_50_1by2 + ssd, it should be easy to
train it (since I have added the model pre-trained on imagenet). Let me
know if the training ssd+resnet
https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd
doc is sufficient, or you run into any bugs.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3qzyLqGRvaVWlLOx1JXRJ2ENOe76ks5qeCaigaJpZM4Jfvjr
.

I haven't done thorough benchmarking on cpu, since I only tested validation set on gpu machines. but I guess that should be true! I will run it on CPU and will update here.

Jay

No need to benchmark on CPU because I have a GPU, TitanX.
What parameters do I need to use with create_ssdnet.py to create ssd+resnet50_1by2 instead of ssd+resnet50?

python app/ssd/create_ssdnet.py --type Resnet -n 256 -b 3 4 6 3 --no-fc_layers -m bottleneck --extra_blocks 3 3 --extra_num_outputs 2048 2048 --mbox_source_layers relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 pool_last --extra_layer_attach pool -c 21 -o ./

Thanks,

--extra_num_outputs could be reduced to 1024 1024, and -n to 128. Rest of the params should remain same I think. Use -h for more help on params.

@kaishijeng did the above params work for you? I am closing this for now, feel free to re-open it if you have additional questions.

Jay,

Yes, it works

Thanks

Jay,

I am able to train ssd_resent50 and ssd_resnet50_1by2 and try out inference on TitanX and Jetson TX1.
For TitanX, I can see the speed improvement, but not much difference on Jetson TX1. I think that it is due to memory bandwidth because of parameter size.
If it is not much effort for you to create ssd_squeezenet, I can do the training and measure inference time on TitanX and Jetson TX1.

Thanks,

@kaishijeng
Squeezenet architecture is a lot different than resnet / vgg in terms of feature map sizes. I am not sure which layers would we attach the detection heads.

If you want to try some experiments, id suggest:

  • Look at code to generate base squeezenet which is available in this app
  • Follow these steps for adding detection heads to a base network.
  • Look at this code on how AssembleLego can be used to attach detection heads to base network.
  • Main part is figuring out which layers we should attach the SSD detection heads. (example see this table for how I attached it to resnet_50), it will need some experimentation.

Please give it a try and I can help out if you have any further questions.

Jay,

 It looks like not s simple exercise to create a ssd+squeezenet network. So  I like to try ssd+resnet18 first. I need to train resnet18 imagenet first and use it a pretained model for ssd+resnet18 training.. 

I plan to use the following command to create resnet18 for imagenet , but not sure the parameters are correct or not. Can you help me to check it:

python app/imagenet/build_resnet.py -m bottleneck -b 2 2 2 2 -n 256 --no-fc_layers -o ./

Also I got an error to use the following command to generate ssd+resnet18. Do you know which parameters are incorrect?
python app/ssd/create_ssdnet.py --type Resnet -n 256 -b 2 2 2 2 --no-fc_layers -m bottleneck --extra_blocks 3 3 --extra_num_outputs 2048 2048 --mbox_source_layers relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 pool_last --extra_layer_attach pool -c 21 -o ./

Thanks,

Sounds good!

You need to modify relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 params to relu_stage1_block1 relu_stage2_block1 relu_stage3_block1 relu_stage4_block1 relu_stage5_block1

Also, extra_blocks could be 2 2 (or your choice, more blocks will increase runtime). Notice that resnet 18 has only 2 blocks in each stage (index starts with 0). Read more here

Jay

Shouldn't main_branch of resnet18 of imagenet be normal instead of
bottleneck? If yes, I use the following command to generate resnet18, there
is an error.
python app/imagenet/build_resnet.py -m normal -b 2 2 2 2 -n 256
--no-fc_layers -o ./

The error is:

F0814 01:05:54.747488 14253 eltwise_layer.cpp:34] Check failed:
bottom[i]->shape() == bottom[0]->shape()
*** Check failure stack trace: ***
Aborted (core dumped)

Thanks,

On Sun, Aug 14, 2016 at 12:41 AM, Jay Mahadeokar notifications@github.com
wrote:

Sounds good!

You need to modify relu_stage1_block3 relu_stage2_block5
relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 params to
relu_stage1_block1 relu_stage2_block1 relu_stage3_block1 relu_stage4_block1
relu_stage5_block1

Also, extra_blocks could be 2 2 (or your choice, more blocks will increase
runtime). Notice that resnet 18 has only 2 blocks in each stage (index
starts with 0). Read more here
https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet#creating-residual-networks


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3iqkohWZrbrfXIMAxCpVa8Sh-t8Aks5qfsbBgaJpZM4Jfvjr
.

Please specify -n as 64. Note that the bottleneck block has 3 layers 64,64,256 filters, whereas normal block has 2 layers with 64,64 filters. Since 1st conv layer has 64 filters, it gives error. I should do this check somewhere!

FYI, resnet_18 has:

python app/imagenet/build_resnet.py -m normal -b 2 2 2 2 -n 64 --no-fc_layers -o ./
Number of params:  11.688512  Million
Number of flops:  1814.082944  Million

The flops is larger than resnet_50_1by2. Not sure if it will be faster, but I havent benchmarked.

@kaishijeng, can you tell me how many times the speed of ssd+resnet50 is on the ssd+vgg16 using a GPU, TitanX. Thank you

@kaishijeng , hi, I have test the Benchmarking by command line caffe time and found that the forward time of ssd+resnet50 is more than the forward time of ssd+vgg16. I do not how you see the speed improvement?

MisayaZ,

  Your data is correct. This has been a while since I did the test last

time.
My impression is ssd+resnet50 is slower to ssd+vgg16. By ssd+resnet50-1by2
is slightly faster than ssd+vgg16, but lower memory footprint

On Wed, Nov 2, 2016 at 12:13 AM, MisayaZ notifications@github.com wrote:

kaishijeng , hi, I have test the of by the Benchmarking by command line
caffe time and found that the forward time of ssd+resnet50 is more than the
forward time of ssd+vgg16. I do not how you see the speed improvement?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3tZ55OuXK42iRlVzDJ2veWiSFCHAks5q6DgIgaJpZM4Jfvjr
.

SqeezeNet is not fast (compare to AlexNet), it just have small on disk size.
See table https://github.com/mrgloom/kaggle-dogs-vs-cats-solution

@kaishijeng Hi kaishijeng,

Have you successfully build the resnet18+SSD and get a good mAP?
If so, could you please share your related resnet18+SSD prototxt file and resnet18 pre-train weights?
Thanks a lot.