tensorflow/model-optimization

about the Quantize layer when trans model

lzcchl opened this issue · 2 comments

I have two model, which is mobilenetv1 for classification.
the first model, it's download from google: https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_224_android_quant_2017_11_08.zip

the second model, it's ctreat by myself, I make its layers same to first model, it is train by keras and Post-training quantization(PTQ) to get tflite model which input/output are 'uint8', but my model have two layer 'Quantize', which is in my model's head and tail. just like image below.

the first model run on my npu is about 8ms, but the second model is 30ms, what happen? it just diff only two 'Quantize' layer. so, what can I do, I follow the sample 'https://tensorflow.google.cn/lite/performance/post_training_integer_quant' to train my model, but it is slow than official model, and a little diff from official model, please help, some suggestions, or some other guide or sample code to get the 'uint8' model.

1658281315889

1658281401224

@sngyhan Could you take a look at this?

Hi @lzcchl, our new quantizer that uses MLIR does not officially support uint8. You need to use TOCO converter. Could you check with this change:

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.experimental_new_converter = False
converter.experimental_new_quantizer = False