OAID/Tengine

量化模型要慢100倍,请问tengine在arm cpu runtime下有没有对于int8/uint8模型推理做优化

shijie-nv opened this issue · 1 comments

我在树莓Pi4下测试 mobilenet_v2 和 mobilenet_v2_u8
都是用cpu运行,量化模型要慢100倍。

但是换用ncnn的测试,量化模型比浮点模型还要快一点。
是不是有什么编译特殊编译选项,还是这部分是没有优化的。

直接用树莓Pi4编译
cmake ..

浮点模型
./tm_classification -g 224,224 -s 0.017,0.017,0.017 -w 104.007,116.669,122.679 -r 8 -t 4 -i ~/projects/mytest/tengine-lite/bin/images/cat.jpg -m ~/projects/mytest/tengine-lite/bin/mobilenet_v2.tmfile
tengine-lite library version: 1.5-dev

model file : /home/pi/projects/mytest/tengine-lite/bin/mobilenet_v2.tmfile
image file : /home/pi/projects/mytest/tengine-lite/bin/images/cat.jpg
img_h, img_w, scale[3], mean[3] : 224 224 , 0.017 0.017 0.017, 104.0 116.7 122.7
Repeat 8 times, thread 4, avg time 49.18 ms, max_time 53.63 ms, min_time 48.25 ms

量化模型
./tm_classification_uint8 -g 224,224 -s 0.017,0.017,0.017 -w 104.007,116.669,122.679 -r 2 -t 4 -i ~/projects/mytest/tengine-lite/bin/images/cat.jpg -m ~/projects/mytest/tengine-lite/bin/mobilenet_v2_u8.tmfile
tengine-lite library version: 1.5-dev

model file : /home/pi/projects/mytest/tengine-lite/bin/mobilenet_v2_u8.tmfile
image file : /home/pi/projects/mytest/tengine-lite/bin/images/cat.jpg
img_h, img_w, scale[3], mean[3] : 224 224 , 0.017 0.017 0.017, 104.0 116.7 122.7
Repeat 2 times, thread 4, avg time 3684.51 ms, max_time 3703.38 ms, min_time 3665.64 ms

用int8量化应该会更快,,不过内存会增大20%-50%,,