About arm platform (ncnn model)

Question

About arm platform (ncnn model)

hanson-young opened this issue 5 years ago · 16 comments

Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.

:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin
Segmentation fault (core dumped)

Answer 1 · 2019-05-28T10:14:15.000Z

Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.
:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin
Segmentation fault (core dumped)

It is caused by the empty of deconv layer weight, Caffe will init new weight, but NCNN not.
I have solved this problem, you can update the new mnet model.
Can you provide the test speed on ARM platform? Thank you!

Answer 2 · 2019-05-28T11:20:34.000Z

Thanks a lot, I will provide it!

Answer 3 · 2019-05-29T06:58:40.000Z

@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpg

qcom835 640*480 VGA(only inference)

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0                         
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
 retinaface-mnet0.25  min =  130.10  max =  131.61  avg =  130.94
       mobilefacenet  min =   48.79  max =   49.55  avg =   49.21
  mobilefacenet-int8  min =   47.21  max =   48.11  avg =   47.76
          squeezenet  min =   63.86  max =   65.61  avg =   64.63
     squeezenet-int8  min =   49.12  max =   49.65  avg =   49.36
           mobilenet  min =  110.70  max =  112.14  avg =  111.47
      mobilenet-int8  min =   88.56  max =   89.66  avg =   89.31
        mobilenet_v2  min =   80.85  max =   82.40  avg =   81.81

Answer 4 · 2019-05-29T07:06:13.000Z

@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpg

qcom835 640*480 VGA(only inference)

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0                         
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
 retinaface-mnet0.25  min =  130.10  max =  131.61  avg =  130.94
       mobilefacenet  min =   48.79  max =   49.55  avg =   49.21
  mobilefacenet-int8  min =   47.21  max =   48.11  avg =   47.76
          squeezenet  min =   63.86  max =   65.61  avg =   64.63
     squeezenet-int8  min =   49.12  max =   49.65  avg =   49.36
           mobilenet  min =  110.70  max =  112.14  avg =  111.47
      mobilenet-int8  min =   88.56  max =   89.66  avg =   89.31
        mobilenet_v2  min =   80.85  max =   82.40  avg =   81.81

Thank you! I update your test result into my README

Answer 5 · 2019-05-30T11:50:39.000Z

@hanson-young hi, very appreciated for your work!

model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize

Answer 6 · 2019-05-30T14:36:40.000Z

@hanson-young hi, very appreciated for your work!

model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize

I have tried, it speeds up 10% on qcom 625
1-thread 379ms
2-thread 244ms
4-thread 180ms

Answer 7 · 2019-05-31T06:59:55.000Z

@nihui 感谢nihui大佬，问题已经解决了，cmake3.9.2在用ndk编译的时候调用openmp会出问题，我降低版本到3.5.1就好了https://gitlab.kitware.com/cmake/cmake/issues/17351
@Charrin 以下是我测试的结果：
高通835 VGA(640*480)

greatqltechn:/data/local/tmp $ ./benchncnn 4 4 0                                                                                                                                                                  
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =   62.31  max =   63.49  avg =   62.79
retinaface-mnet0.25_opt  min =   67.09  max =   82.76  avg =   75.99
       mobilefacenet  min =   15.89  max =   16.32  avg =   16.09
   mobilefacenet_opt  min =   14.12  max =   14.59  avg =   14.42
  mobilefacenet_int8  min =   16.11  max =   16.45  avg =   16.26
          squeezenet  min =   22.76  max =   26.53  avg =   23.94
     squeezenet_int8  min =   18.77  max =   19.20  avg =   18.99
           mobilenet  min =   34.43  max =   34.91  avg =   34.66
      mobilenet_int8  min =   28.90  max =   31.59  avg =   30.00

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 2 0                                                                                                                                                              
loop_count = 4
num_threads = 2
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =   82.75  max =   83.10  avg =   82.97
retinaface-mnet0.25_opt  min =   73.44  max =   75.41  avg =   74.52
       mobilefacenet  min =   28.08  max =   30.48  avg =   28.97
   mobilefacenet_opt  min =   25.23  max =   25.98  avg =   25.54
  mobilefacenet_int8  min =   29.37  max =   29.91  avg =   29.69
          squeezenet  min =   35.18  max =   38.03  avg =   36.80
     squeezenet_int8  min =   29.45  max =   31.90  avg =   30.67
           mobilenet  min =   58.60  max =   59.68  avg =   59.17
      mobilenet_int8  min =   51.27  max =   52.94  avg =   51.73

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0                                                                                                                                                              
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =  136.17  max =  138.68  avg =  137.37
retinaface-mnet0.25_opt  min =  123.71  max =  127.71  avg =  125.10
       mobilefacenet  min =   51.50  max =   53.77  avg =   52.40
   mobilefacenet_opt  min =   46.99  max =   47.81  avg =   47.52
  mobilefacenet_int8  min =   56.54  max =   58.16  avg =   57.55
          squeezenet  min =   64.10  max =   65.19  avg =   64.77
     squeezenet_int8  min =   51.01  max =   51.62  avg =   51.42
           mobilenet  min =  107.86  max =  111.64  avg =  109.71
      mobilenet_int8  min =   98.07  max =   98.55  avg =   98.30

Answer 8 · 2019-07-12T10:30:13.000Z

@hanson-young have you compare the speed of arm platform inference for Retinaface vs. MTCNN model?

Answer 9 · 2019-07-15T01:57:41.000Z

@pineking It’s hard to say, related to specific platforms and uses

Answer 10 · 2019-09-10T08:07:06.000Z

@hanson-young my test time on 835 is about 20 ms slower than your inference time，could you share your NCNN lib and include files for android?thank you very much!!

Answer 11 · 2019-09-11T01:05:44.000Z

@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.

Answer 12 · 2019-09-11T01:15:50.000Z

@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.
@hanson-young ok! I tried an older version and it really speeds up, thank you very much!

Answer 13 · 2019-09-11T03:29:29.000Z

@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?

Answer 14 · 2019-09-11T04:45:34.000Z

@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?

@hanjw123 HI，I also test the speed of retinaface model, would you like to discuss this together?
my wechat is pineking

Answer 15 · 2019-09-11T11:53:23.000Z

@pineking I run it on arm arch64,not android application

Answer 16 · 2019-09-25T16:19:28.000Z

测试了一下caffe的mnet模型在树莓派4B上的速度，推理框架用的阿里的MNN，树莓派4B的cpu型号是BCM2711（四核Cortex A72，主频1.5GHz），测试分辨率为VGA (640*480)，loop10次取平均：

核心数	fp32计算耗时（ms）	量化后int8计算耗时（ms）
1	167	183
2	116	102
3	105	76
4	96	61