DoReFa quantizer with higher number of MACs/Ops, Grouped convs as custom ops on LCE 0.7.0

Question

DoReFa quantizer with higher number of MACs/Ops, Grouped convs as custom ops on LCE 0.7.0

Opened this issue 2 years ago · 3 comments

Hello, I have a couple of questions regarding quantizer options for Larq and LCE.

I am designing a BNN using the DoReFa quantizer, however, I noticed a very high number of estimated MACs and Ops when converting the model for ARM64. Changing the quantizer to "ste_sign" dramatically lowered the number of MACs and Ops.

I was wondering if there is a way to use the DoReFa quantizer for training without the serious overhead of operations when converting and running the model for inference in LCE? Is the "ste_sign" quantizer the only viable option for efficient inference?

Thank you for the excellent work and for your attention.

Answer 1 · 2022-07-26T18:30:01.000Z

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2).
Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

Example:
Grouped (g=2) convs converter output:

2022-07-26 13:06:17.469686: W external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1903] The following operation(s) need TFLite custom op implementation(s):
Custom ops: Conv2D
Details:
tf.Conv2D(tensor<1x32x32x64xf32>, tensor<5x5x32x32xf32>) -> (tensor<1x11x11x32xf32>) : {data_format = "NHWC", dilations = [1, 1, 1, 1], explicit_paddings = [], padding = "SAME", strides = [1, 3, 3, 1], use_cudnn_on_gpu = true}
See instructions: https://www.tensorflow.org/lite/guide/ops_custom
2022-07-26 13:06:17.469772: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 5792 ops, equivalently 2896 MACs

Estimated count of arithmetic ops: 5792 ops, equivalently 2896 MACs

Quantizer small example (2 qconv layers):

Example with ste_sign mode="weights":

2022-07-26 13:14:57.680246: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 1.164 M ops, equivalently 0.582 M MACs

Estimated count of arithmetic ops: 1.164 M ops, equivalently 0.582 M MACs

Changing to DoReFa mode="weights":

2022-07-26 13:16:05.771057: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 1.663 M ops, equivalently 0.831 M MACs

Estimated count of arithmetic ops: 1.663 M ops, equivalently 0.831 M MACs

I was able to successfully benchmark my model with DoReFa and grouped convolutions converted on version 0.6.2 with a better-than-expected efficiency but not the one converted with version 0.7.0
I am using Tensorflow 2.8.0 and larq 0.12.2

Answer 2 · 2022-07-26T21:46:23.000Z

Sorry for the late reply.

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2).
Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

Unfortunately this was an issue with TensorFlow 2.8 which LCE 0.7.0 uses under the hood. This has been fixed on master since we upgraded to 2.9, but we haven't published a new release with it yet. Sorry about that. For now, I'd recommend sticking with 0.6.2 if grouped convolution support is required.

Is the "ste_sign" quantizer the only viable option for efficient inference?

For binarised convolutions this is recommended for the activation. You can also use custom activation quantisers as well, but to make sure they convert correctly they should be implemented with larq.math.sign which unfortunately is not the case for DoReFa. Regarding weight quantization other quantisers should work fine as long as they binarise to {-1, 1} or {-alpha, alpha}.

I recommend looking at the converted model in Netron to make sure the conversion worked as intended.

Answer 3 · 2022-08-25T09:44:13.000Z

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2).
Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

@lluevano sorry for the delay. We just release v0.8.0 including a fix for this. Let me know if that works for you.