model quantified slower than not quantifed on windows x64

Question

model quantified slower than not quantifed on windows x64

imjking opened this issue a year ago · 1 comments

imjking commented a year ago

Issue Type

Performance

OS

Windows

OS architecture

Other

Programming Language

Python

Framework

TensorFlow, TensorFlowLite

Download URL for tflite file

https://storage.googleapis.com/mediapipe-assets/face_detection_short_range.tflite
https://storage.googleapis.com/mediapipe-assets/face_detection_full_range.tflite

Convert Script

tflite2tensorflow --model_path face_detection_short_range.tflite --flatc_path ../flatc --schema_path ../schema.fbs --output_pb

tflite2tensorflow --model_path face_detection_short_range.tflite --flatc_path ../flatc --schema_path ../schema.fbs --output_no_quant_float32_tflite --output_dynamic_range_quant_tflite --output_weight_quant_tflite --output_float16_quant_tflite --output_integer_quant_tflite

Description

hello, as the title say. i convert successfully the model, but the quant model is slower than original float32 model, and integer quant is slower than weight quant on windows x64. Do you know the reason? looking forward to your replay.

Relevant Log Output

inference time:
original float32  15ms
weight quant 150
integer quant 200

Source code for simple inference testing code

No response

Answer 1 · 2023-07-27T08:25:52.000Z

It's too much trouble to answer in detail, so I'll just tell you the conclusion.

It is normal for them to slow down. Explore the Internet and GitHub issues to find the answer. I do not want to give the same answer over and over again.