tensorflow/model-optimization

Full Int8 QAT not working

MATTYGILO opened this issue · 13 comments

Just a quick question. I want my final model to be full int8 instead of float32 for input and outputs. I want the training to be as accurate as possible. Do I train with quantised input and outputs? Because I have followed the common procedure in the comprehensive guide (with my custom model) and it hasn't worked.
So

  1. I trained using the comprehensive guide but modified it to my model
  2. After training I use these settings to quantise my model
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
  1. When I go to evaluate the model it is completely inaccurate

What do I need to do to allow for full int8 to work?

All help welcome

Hi Matty,
Passing converter.representative_dataset = representative_dataset is only required for post-training quantization. If you want to use QAT, follow the guide at https://www.tensorflow.org/model_optimization/guide/quantization/training_example( use quantize_model before training and train it with the non-quantized input as usual, and then convert it to TFLite).

@thaink I have followed the guides. However I'm using tflite micro which requires full int 8. In none of the examples does it show what to do for full int 8 for input and output. Even if you QAT you still have to convert it using post training quantization and there are no examples of int8 inputs and outputs for QAT.

The inference_input_type and inference_output_type is to use int8 input and output actually.

@thaink I've already set those values. Are you suggesting I train on quantised data?

Can you share or describe what your output model looks like?

@thaink I've converted the model with full int 8 but the output of the model is complete rubbish. So I did QAT, of which I have converted to full int 8 but the output is complete rubbish.

@thaink What is the suggest way of doing full int8 QAT on a model

@thaink This is how I QAT.

import tensorflow_model_optimization as tfmot

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # List all of your weights
    weights = {
        "kernel": LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)
    }

    # List of all your activations
    activations = {
        "activation": MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False)
    }

    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.weights.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.activations.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order

        count = 0
        for attribute in self.weights.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_weights[count])
                count += 1

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        count = 0
        for attribute in self.activations.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_activations[count])
                count += 1

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}

from quant import DefaultDenseQuantizeConfig
from tensorflow_model_optimization.python.core.quantization.keras.quantize import quantize_scope, quantize_apply
import tensorflow_model_optimization as tfmot


with quantize_scope({
    "DefaultDenseQuantizeConfig": DefaultDenseQuantizeConfig,
    "CustomLayer": CustomLayer
}):
    def apply_quantization_to_layer(layer):
        return tfmot.quantization.keras.quantize_annotate_layer(layer, DefaultDenseQuantizeConfig())

    annotated_model = tf.keras.models.clone_model(
        tflite_model,
        clone_function=apply_quantization_to_layer,
    )

    qat_model = tfmot.quantization.keras.quantize_apply(annotated_model)

    qat_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
        loss="categorical_crossentropy",
        metrics=['accuracy']
    )

    qat_model.summary()

Please I need all help and advice

@Xhark Could you check if the Matt is QAT-ing the right way?

Hi, @MATTYGILO, I am experiencing the same problem. The full int8 QAT-derived Tensorflow-lite model (using reference data to set input and output to Int8) doesn't seem to work. I am losing a lot of accuracy after the model conversion. I was wondering if you found a solution for this Full int8 QAT model conversion. Thank you!

Thank you very much for your help. I am facing the same issue with mobilenetV3 (both with PTQ and QAT), any ideas on why this might be the case? Thank you. @thaink

Hi, I am facing the same issue as well for QAT with MobileNetV3 (accuracy for QAT TFLite model is much lower than the corresponding QAT Keras Model). Is there any fix for this yet?