Dorefa model size and behavior with full precision model and ste_sign model

Question

Dorefa model size and behavior with full precision model and ste_sign model

hamingsi opened this issue 6 months ago · 13 comments

I'm trying to test the dorefa model with full precision and ste_sign model, to find out the difference.
But I got the information I don't understand:
Dorefa model size is close to full precision model rather than ste_sign model

Here is my LCE test on Mac m1 chip:

Dorefa model size inference time is faster than ste_sign(why?) and close to full precision model, which is strange.
Here is my test code for Dorefa, full precision and ste_sign:

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        lq.layers.QuantConv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        lq.layers.QuantConv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

for ste_sign I just switch from lq.quantizers.DoReFa to "ste_sign"
Here is full precision code

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        tf.keras.layers.Conv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        tf.keras.layers.Conv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

Answer 1 · 2024-03-18T08:40:34.000Z

Can you open the tflite files in netron and compare the binary layers? Perhaps the DoReFa quantizer is not picked up by the tflite converter.

Answer 2 · 2024-03-18T10:10:08.000Z

Yeah, I tried this. It seems that Dorefa don't have binary layer. So LCE won't speed up dorefa quantizer?

Answer 3 · 2024-03-18T10:17:05.000Z

My major concern is whether the activation as [0,1] with weight [-1,1] computation can be speed up or not.
I want to implement some activcation like LIF neuron which only emits spike in [0,1]. With binary weight, maybe it will decrease inference time and memory cost substantially.

Answer 4 · 2024-03-18T10:17:51.000Z

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer.
To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

Answer 5 · 2024-03-18T10:21:58.000Z

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer. To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

I did use k_bit=1 in my code, but still not work.

Answer 6 · 2024-03-18T10:42:00.000Z

I mean that the implementation of the dorefa quantizer needs a specialization for k_bit==1.
See here:
https://github.com/larq/larq/blob/v0.13.1/larq/quantizers.py#L680-L682

This would have to be changed to something like this:

        def _k_bit_with_identity_grad(x):
            if self.precision == 1:
                return tf.where(tf.math.less_equal(x, 0.5), tf.zeros_like(x), tf.ones_like(x)), lambda dy: dy
            else:
                n = 2**self.precision - 1
                return tf.round(x * n) / n, lambda dy: dy

Note: I did not test this, you'll have to verify that it works as expected and that the LCE converter recognizes this.

Answer 7 · 2024-03-18T10:58:23.000Z

Thanks. I will try this. But I'm still confused why full precision model run faster than ste_sign did.

Answer 8 · 2024-03-18T11:09:19.000Z

I'm still confused why full precision model run faster than ste_sign did.

On what type of machine are you running this? LCE does not provide optimized code for the x86_64 architecture, only for 32-bit ARM and 64-bit ARM. So on x86_64, it is expected that the full precision model runs faster.

Answer 9 · 2024-03-18T12:49:25.000Z

I'm running on Mac m1 chip. I compile the LCE with bazel.--macos_cpus=arm64. Is that correct?

Answer 10 · 2024-03-18T12:59:30.000Z

Compiling lce_benchmark_model with --macos_cpus=arm64 is correct I think.

Its possible that the M1 chip is more optimized for full-precision layers than for binary layers.

Answer 11 · 2024-03-18T13:14:19.000Z

That's amazing. I will try different arm device.
So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

Answer 12 · 2024-03-18T13:16:23.000Z

So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

That is correct. It's always best to check the tflite file in netron to see if the layers got converted to Lce binary layers.

Answer 13 · 2024-03-18T13:20:06.000Z

Thanks a lot!