Full-integer quantization tflite models
aidansmyth95 opened this issue · 0 comments
aidansmyth95 commented
Has anyone converted the FastSpeech2 or Tacotron models to full-integer quantized tflite models?
My representative dataset generator for FastSpeech2 is returning a floating point exception during conversion, any ideas about what I might be doing wrong? This seems close to enabling the int8x8 tflite model. I need to run on an ARM Ethos-U55 NPU processor, where floating-point support is limited. I don't care so much about quantization error for now, rather profiling it on the U55 once I have a tflite model. We can use tricks like QAT if we need to reduce the quantization error later.
def representative_dataset():
# Provide a set of input samples that are representative of the data
# the model will be dealing with during inference
for _ in range(1): # Adjust the number of samples as needed
input_ids = tf.convert_to_tensor(np.random.randint(0, 1, size=(1, 50), dtype=np.int32), dtype=tf.int32) # Example input shape
speaker_ids = tf.convert_to_tensor(np.array([1], dtype=np.int32), dtype=tf.int32)
speed_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
f0_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
energy_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
print(input_ids.shape)
yield ([input_ids, speaker_ids, speed_ratios, f0_ratios, energy_ratios])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
converter.target_spec.supported_types = [tf.int8]