About arm platform (ncnn model)
hanson-young opened this issue · 16 comments
Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.
:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin
Segmentation fault (core dumped)
Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.
:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin Segmentation fault (core dumped)
It is caused by the empty of deconv layer weight, Caffe will init new weight, but NCNN not.
I have solved this problem, you can update the new mnet model.
Can you provide the test speed on ARM platform? Thank you!
Thanks a lot, I will provide it!
@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpg
qcom835 640*480 VGA(only inference)
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
retinaface-mnet0.25 min = 130.10 max = 131.61 avg = 130.94
mobilefacenet min = 48.79 max = 49.55 avg = 49.21
mobilefacenet-int8 min = 47.21 max = 48.11 avg = 47.76
squeezenet min = 63.86 max = 65.61 avg = 64.63
squeezenet-int8 min = 49.12 max = 49.65 avg = 49.36
mobilenet min = 110.70 max = 112.14 avg = 111.47
mobilenet-int8 min = 88.56 max = 89.66 avg = 89.31
mobilenet_v2 min = 80.85 max = 82.40 avg = 81.81
@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpgqcom835 640*480 VGA(only inference)
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0 loop_count = 4 num_threads = 1 powersave = 0 gpu_device = 0 retinaface-mnet0.25 min = 130.10 max = 131.61 avg = 130.94 mobilefacenet min = 48.79 max = 49.55 avg = 49.21 mobilefacenet-int8 min = 47.21 max = 48.11 avg = 47.76 squeezenet min = 63.86 max = 65.61 avg = 64.63 squeezenet-int8 min = 49.12 max = 49.65 avg = 49.36 mobilenet min = 110.70 max = 112.14 avg = 111.47 mobilenet-int8 min = 88.56 max = 89.66 avg = 89.31 mobilenet_v2 min = 80.85 max = 82.40 avg = 81.81
Thank you! I update your test result into my README
@hanson-young hi, very appreciated for your work!
model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize
@hanson-young hi, very appreciated for your work!
model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize
I have tried, it speeds up 10% on qcom 625
1-thread 379ms
2-thread 244ms
4-thread 180ms
@nihui 感谢nihui大佬,问题已经解决了,cmake3.9.2在用ndk编译的时候调用openmp会出问题,我降低版本到3.5.1就好了https://gitlab.kitware.com/cmake/cmake/issues/17351
@Charrin 以下是我测试的结果:
高通835 VGA(640*480)
greatqltechn:/data/local/tmp $ ./benchncnn 4 4 0
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
retinaface-mnet0.25 min = 62.31 max = 63.49 avg = 62.79
retinaface-mnet0.25_opt min = 67.09 max = 82.76 avg = 75.99
mobilefacenet min = 15.89 max = 16.32 avg = 16.09
mobilefacenet_opt min = 14.12 max = 14.59 avg = 14.42
mobilefacenet_int8 min = 16.11 max = 16.45 avg = 16.26
squeezenet min = 22.76 max = 26.53 avg = 23.94
squeezenet_int8 min = 18.77 max = 19.20 avg = 18.99
mobilenet min = 34.43 max = 34.91 avg = 34.66
mobilenet_int8 min = 28.90 max = 31.59 avg = 30.00
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 2 0
loop_count = 4
num_threads = 2
powersave = 0
gpu_device = -1
retinaface-mnet0.25 min = 82.75 max = 83.10 avg = 82.97
retinaface-mnet0.25_opt min = 73.44 max = 75.41 avg = 74.52
mobilefacenet min = 28.08 max = 30.48 avg = 28.97
mobilefacenet_opt min = 25.23 max = 25.98 avg = 25.54
mobilefacenet_int8 min = 29.37 max = 29.91 avg = 29.69
squeezenet min = 35.18 max = 38.03 avg = 36.80
squeezenet_int8 min = 29.45 max = 31.90 avg = 30.67
mobilenet min = 58.60 max = 59.68 avg = 59.17
mobilenet_int8 min = 51.27 max = 52.94 avg = 51.73
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
retinaface-mnet0.25 min = 136.17 max = 138.68 avg = 137.37
retinaface-mnet0.25_opt min = 123.71 max = 127.71 avg = 125.10
mobilefacenet min = 51.50 max = 53.77 avg = 52.40
mobilefacenet_opt min = 46.99 max = 47.81 avg = 47.52
mobilefacenet_int8 min = 56.54 max = 58.16 avg = 57.55
squeezenet min = 64.10 max = 65.19 avg = 64.77
squeezenet_int8 min = 51.01 max = 51.62 avg = 51.42
mobilenet min = 107.86 max = 111.64 avg = 109.71
mobilenet_int8 min = 98.07 max = 98.55 avg = 98.30
@hanson-young have you compare the speed of arm platform inference for Retinaface vs. MTCNN model?
@pineking It’s hard to say, related to specific platforms and uses
@hanson-young my test time on 835 is about 20 ms slower than your inference time,could you share your NCNN lib and include files for android?thank you very much!!
@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.
@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.
@hanson-young ok! I tried an older version and it really speeds up, thank you very much!
@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?
@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?
@hanjw123 HI,I also test the speed of retinaface model, would you like to discuss this together?
my wechat is pineking
@pineking I run it on arm arch64,not android application
测试了一下caffe的mnet模型在树莓派4B上的速度,推理框架用的阿里的MNN,树莓派4B的cpu型号是BCM2711(四核Cortex A72,主频1.5GHz),测试分辨率为VGA (640*480),loop10次取平均:
核心数 | fp32计算耗时(ms) | 量化后int8计算耗时(ms) |
---|---|---|
1 | 167 | 183 |
2 | 116 | 102 |
3 | 105 | 76 |
4 | 96 | 61 |