larq/compute-engine

Int8 default ranges break when a bconv is followed by a normal conv.

AdamHillier opened this issue · 5 comments

Observed behaviour

When converting this model...

model = tf.keras.models.Sequential([
    tf.keras.Input((32, 32, 3)),
    lq.layers.QuantConv2D(
        32,
        (3, 3),
        input_quantizer="ste_sign",
        kernel_quantizer="ste_sign",
        padding="same",
        pad_values=1.0,
        use_bias=False
    ),
    tf.keras.layers.Conv2D(32, (3, 3)),
])
converted_model = lce.convert_keras_model(model, experimental_default_int8_range=(-3, 3))

...we obtain the following converted model, with extra dequantise and quantise nodes around the Conv2D:

image

Expected behaviour

We expect there to be no dequantise or quantise nodes in a converted model when the experimental_default_int8_range argument is used.

If the QuantConv2D is replaced by a normal Conv2D we get:

model = tf.keras.models.Sequential([
    tf.keras.Input((32, 32, 3)),
    tf.keras.layers.Conv2D(
        32, (3, 3), padding="same", use_bias=False
    ),
    tf.keras.layers.Conv2D(32, (3, 3)),
])
converted_model = lce.convert_keras_model(model, experimental_default_int8_range=(-3, 3))

image

Similarly, if the Conv2D is replaced with a QuantConv2D we get:

model = tf.keras.models.Sequential([
    tf.keras.Input((32, 32, 3)),
    lq.layers.QuantConv2D(
        32,
        (3, 3),
        input_quantizer="ste_sign",
        kernel_quantizer="ste_sign",
        padding="same",
        pad_values=1.0,
        use_bias=False
    ),
    lq.layers.QuantConv2D(
        32,
        (3, 3),
        input_quantizer="ste_sign",
        kernel_quantizer="ste_sign",
        padding="same",
        pad_values=1.0,
        use_bias=False
    ),
])
converted_model = lce.convert_keras_model(model, experimental_default_int8_range=(-3, 3))

image

So there is something specifically going wrong with the QuantConv2D > Conv2D combination.

Just to note, until we fix this in LCE this issue can be worked around by explitly adding TensorFlow fake quantise ops before and after the problematic layer that doesn't get converted properly.

E.g.:

model = tf.keras.models.Sequential([
    tf.keras.Input((32, 32, 3)),
    lq.layers.QuantConv2D(
        32,
        (3, 3),
        input_quantizer="ste_sign",
        kernel_quantizer="ste_sign",
        padding="same",
        pad_values=1.0,
        use_bias=False
    ),
    tf.keras.layers.Lambda(lambda x: tf.quantization.fake_quant_with_min_max_args(x, -3.0, 3.0)),
    tf.keras.layers.Conv2D(32, (3, 3)),
    tf.keras.layers.Lambda(lambda x: tf.quantization.fake_quant_with_min_max_args(x, -3.0, 3.0)),
])
converted_model = lce.convert_keras_model(model, experimental_default_int8_range=(-3, 3))

image

(It may not be clear from the screenshot, but the converted model is entirely Int8.)

Looks like we are running into tensorflow/tensorflow#40055 here since the bias tensors are shared which prevents quantization of the 8bit BConv.
E.g. the following network with a non-zero bias results in the correct output:

model = tf.keras.models.Sequential([
    tf.keras.Input((32, 32, 3)),
    lq.layers.QuantConv2D(
        32,
        (3, 3),
        input_quantizer="ste_sign",
        kernel_quantizer="ste_sign",
        padding="same",
        pad_values=1.0,
        use_bias=False
    ),
    tf.keras.layers.Conv2D(32, (3, 3), bias_initializer="random_uniform"),
])
converted_model = lce.convert_keras_model(model, experimental_default_int8_range=(-3, 3))

the bias tensors are shared which prevents quantization of the 8bit BConv.

Just an extra note: if that's indeed the cause of the bug, then training the network (on any dataset, even random data) for just 1 epoch should also fix this, and will work on any type of network.

@lgeiger do you have any idea why a Conv2D combined with a Conv2D works though, wouldn't that run into the same issue of shared bias tensors?

@lgeiger do you have any idea why a Conv2D combined with a Conv2D works though, wouldn't that run into the same issue of shared bias tensors?

There it is not a problem since both biases will be converted to int32 and still can be shared. The reason it fails is because BConv2d expects float whereas Conv2d expects int32 which can't be satisfied with a share tensor. Note that this is only a problem with the default ranges pass and not with conversion of a trained network that doesn't rely on default ranges.