Performance unexplainabilitty for tflite int8 and fp32 models
Opened this issue · 0 comments
arun-kumark commented
Dear all,
I am testing the performance/throughput of fp32 and quantized models on my platform. My configuration is as follows:
tflite-runtime==2.5.0.post1
tensorflow==1.14.0
*FP32 on CPU
-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:00:28.796083
-INFO- Throughput: 35.8 fps
-INFO- Latency: 29.5 ms
-INFO- Target Workload H/W Prec Batch Conc. Metric Score Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet cpu fp32 1 1 throughput 35.8 fps
-INFO- tensorflow_lite mobilenet cpu fp32 1 1 latency 29.5 ms
-INFO- Total runtime: 0:00:28.830364
-INFO- Done
INT8 on CPU
google@localhost:~/mlmark$ harness/mlmark.py -c config/tflite-cpu-mobilenet-int8-throughput.json
-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:01:00.933346
-INFO- Throughput: 16.9 fps
-INFO- Latency: 65. ms
-INFO- Target Workload H/W Prec Batch Conc. Metric Score Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet cpu int8 1 1 throughput 16.9 fps
-INFO- tensorflow_lite mobilenet cpu int8 1 1 latency 65. ms
-INFO- Total runtime: 0:01:00.960828
-INFO- Done
Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite:
https://www.tensorflow.org/lite/guide/hosted_models#quantized_models
I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results.
Could you let me know, where it's going wrong?
Thanks
Kind Regards
Arun