larq/compute-engine

Dorefa model size and behavior with full precision model and ste_sign model

hamingsi opened this issue · 13 comments

I'm trying to test the dorefa model with full precision and ste_sign model, to find out the difference.
But I got the information I don't understand:
Dorefa model size is close to full precision model rather than ste_sign model
image
Here is my LCE test on Mac m1 chip:
image
image
image
Dorefa model size inference time is faster than ste_sign(why?) and close to full precision model, which is strange.
Here is my test code for Dorefa, full precision and ste_sign:

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        lq.layers.QuantConv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        lq.layers.QuantConv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

for ste_sign I just switch from lq.quantizers.DoReFa to "ste_sign"
Here is full precision code

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        tf.keras.layers.Conv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        tf.keras.layers.Conv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

Can you open the tflite files in netron and compare the binary layers? Perhaps the DoReFa quantizer is not picked up by the tflite converter.

Yeah, I tried this. It seems that Dorefa don't have binary layer. So LCE won't speed up dorefa quantizer?
image

My major concern is whether the activation as [0,1] with weight [-1,1] computation can be speed up or not.
I want to implement some activcation like LIF neuron which only emits spike in [0,1]. With binary weight, maybe it will decrease inference time and memory cost substantially.

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer.
To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer. To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

I did use k_bit=1 in my code, but still not work.

I mean that the implementation of the dorefa quantizer needs a specialization for k_bit==1.
See here:
https://github.com/larq/larq/blob/v0.13.1/larq/quantizers.py#L680-L682

This would have to be changed to something like this:

        def _k_bit_with_identity_grad(x):
            if self.precision == 1:
                return tf.where(tf.math.less_equal(x, 0.5), tf.zeros_like(x), tf.ones_like(x)), lambda dy: dy
            else:
                n = 2**self.precision - 1
                return tf.round(x * n) / n, lambda dy: dy

Note: I did not test this, you'll have to verify that it works as expected and that the LCE converter recognizes this.

Thanks. I will try this. But I'm still confused why full precision model run faster than ste_sign did.

I'm still confused why full precision model run faster than ste_sign did.

On what type of machine are you running this? LCE does not provide optimized code for the x86_64 architecture, only for 32-bit ARM and 64-bit ARM. So on x86_64, it is expected that the full precision model runs faster.

I'm running on Mac m1 chip. I compile the LCE with bazel.--macos_cpus=arm64. Is that correct?

Compiling lce_benchmark_model with --macos_cpus=arm64 is correct I think.

Its possible that the M1 chip is more optimized for full-precision layers than for binary layers.

That's amazing. I will try different arm device.
So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

That is correct. It's always best to check the tflite file in netron to see if the layers got converted to Lce binary layers.

Thanks a lot!