Does Post-training full integer quantization support BERT?
MrRace opened this issue · 1 comments
MrRace commented
Does Post-training full integer quantization in https://www.tensorflow.org/lite/performance/post_training_integer_quant#convert_using_float_fallback_quantization support BERT?
I convert my pb model to tf lite:
dataset = create_dataset()
def representative_dataset():
for data in dataset:
yield {
"token_type_ids": np.array(data.segment_ids),
"attention_mask": np.array(data.input_mask),
"input_ids": np.array(data.input_ids),
}
converter = tf.lite.TFLiteConverter.from_saved_model(pb_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
tflite_quant_model = converter.convert()
tflite_path = res_tf_lite_file
open(tflite_path, "wb").write(tflite_quant_model)
assert os.path.exists(tflite_path)
print("tflite model={} converted successfully.".format(tflite_path))
interpreter = tf.lite.Interpreter(model_path=tflite_path)
# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(f'tflite input {input_details}')
print(f'tflite output {output_details}')
I use float fallback quantization from https://www.tensorflow.org/lite/performance/post_training_integer_quant.
However the result is totally different compare to the not quantization result.
Anyone can help? Thanks a lot!