tensorflow/model-optimization

Does Post-training full integer quantization support BERT?

MrRace opened this issue · 1 comments

Does Post-training full integer quantization in https://www.tensorflow.org/lite/performance/post_training_integer_quant#convert_using_float_fallback_quantization support BERT?
I convert my pb model to tf lite:

    dataset = create_dataset()

    def representative_dataset():
        for data in dataset:
            yield {
                "token_type_ids": np.array(data.segment_ids),
                "attention_mask": np.array(data.input_mask),
                "input_ids": np.array(data.input_ids),
            }   

    converter = tf.lite.TFLiteConverter.from_saved_model(pb_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_dataset
    tflite_quant_model = converter.convert()
    tflite_path = res_tf_lite_file
    open(tflite_path, "wb").write(tflite_quant_model)
    assert os.path.exists(tflite_path)
    print("tflite model={} converted successfully.".format(tflite_path))

    interpreter = tf.lite.Interpreter(model_path=tflite_path)
    # Get input and output tensors
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    print(f'tflite input {input_details}')
    print(f'tflite output {output_details}')

I use float fallback quantization from https://www.tensorflow.org/lite/performance/post_training_integer_quant.
However the result is totally different compare to the not quantization result.
Anyone can help? Thanks a lot!

@yyoon Could you help to solve it? Thanks a lot!