larq/compute-engine

Spurious Quantize/Dequantize on int8 quantized model with LCE converter

Closed this issue ยท 3 comments

While working on models with binary and int8 quantized layers (NOTE: not using the built-in function quantize_model from tensorflow_model_optimization since I will further need it to quantize a custom model inherited from tf.keras.Model which is not supported) I've noticed a bug in the LCE conversion pass.

To illustrate the behavior I've made a small toy model using manual placement of tf.quantization.fake_quant_with_min_max_vars ops.

There is an inconsistency between TF and LCE converted models.
The model converts correctly when using the TFLite converter, whereas the LCE converter adds spurious Quantize/Dequantizeops at the beginning (although the input is already in tf.float32).

Toy model (LCE) Toy model (TF)

Seems to be a bug ?

Versions used: TensorFlow 2.5.0-rc1/2.3.1, Larq Compute Engine 0.5.0

Steps to reproduce:

import tensorflow as tf
import larq
from larq_compute_engine import convert_keras_model

def toy_int8_model():
    x = tf.keras.Input((240, 320, 3), dtype=tf.float32)
    out = larq.layers.QuantConv2D(filters=32, kernel_size=3, strides=1, padding='valid', use_bias=False, activation=None)(x)
    out = larq.layers.QuantConv2D(filters=32, kernel_size=3, strides=1, padding='valid', use_bias=False, activation=None)(out)
    out = tf.keras.layers.Lambda(lambda x: tf.quantization.fake_quant_with_min_max_vars(x, -3.0, 3.0))(out)
    out = larq.layers.QuantConv2D(filters=32, kernel_size=3, strides=1, padding='valid', use_bias=False, activation=None)(out)
    out = tf.keras.layers.Lambda(lambda x: tf.quantization.fake_quant_with_min_max_vars(x, -3.0, 3.0))(out)
    return tf.keras.Model(inputs=x, outputs=out)

model = toy_int8_model()

# LCE conversion
lce_model = convert_keras_model(model, inference_input_type = tf.float32, inference_output_type = tf.float32, experimental_enable_bitpacked_activations = True)
# TFLite conversion
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

# Saving the models.
with open('lce_model.tflite', 'wb') as f:
  f.write(lce_model)
with open('tflite_model.tflite', 'wb') as f:
  f.write(tflite_model)

@Tombana @lgeiger @AdamHillier any clues why this is happening?

Hi @simonmaurer,
Apologies for the delay. What you're describing does indeed look like a bug.
Our converter is mostly a copy of the Tensorflow MLIR converter but with our custom Larq conversion passes add inbetween some TF passes. Our passes don't touch the quantize nodes, so if the bug does not happen with the Tensorflow converter then my guess is that we did not correctly copy the quantization passes from Tensorflow:

void AddQuantizationPasses(const mlir::TFL::QuantizationSpecs& quant_specs,

@lgeiger I've assigned you because you're a bit more familiar with this, but feel free to un-assign yourself and assign me instead, then I'll look into it later this week.

@Tombana now worries at all. appreciate it for looking into it ๐Ÿ‘