Full-integer quantization tflite models

Question

Full-integer quantization tflite models

aidansmyth95 opened this issue 7 months ago · 0 comments

Has anyone converted the FastSpeech2 or Tacotron models to full-integer quantized tflite models?

My representative dataset generator for FastSpeech2 is returning a floating point exception during conversion, any ideas about what I might be doing wrong? This seems close to enabling the int8x8 tflite model. I need to run on an ARM Ethos-U55 NPU processor, where floating-point support is limited. I don't care so much about quantization error for now, rather profiling it on the U55 once I have a tflite model. We can use tricks like QAT if we need to reduce the quantization error later.

def representative_dataset():
    # Provide a set of input samples that are representative of the data
    # the model will be dealing with during inference
    for _ in range(1):  # Adjust the number of samples as needed
        input_ids = tf.convert_to_tensor(np.random.randint(0, 1, size=(1, 50), dtype=np.int32), dtype=tf.int32)  # Example input shape
        speaker_ids = tf.convert_to_tensor(np.array([1], dtype=np.int32), dtype=tf.int32)
        speed_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        f0_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        energy_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        print(input_ids.shape)
        yield ([input_ids, speaker_ids, speed_ratios, f0_ratios, energy_ratios])

   converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.representative_dataset = representative_dataset
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8
    converter.target_spec.supported_types = [tf.int8]