ARM-software/ComputeLibrary

Cast operation at Gpu backend should support DataType::QSYMM8

peiciwu opened this issue · 4 comments

Output of 'strings libarm_compute.so | grep arm_compute_version':

arm_compute_version=v0.0-unreleased Build options: {'toolchain_prefix': '', 'compiler_prefix': '', 'build': 'native', 'arch': 'arm64-v8a', 'neon': '1', 'opencl': '1', 'experimental_dynamic_fusion': '0', 'Werror': '0', 'embed_kernels': '1', 'examples': '0', 'validation_tests': '0', 'benchmark_tests': '0', 'benchmark_examples': '0', 'compiler_cache': 'ccache', 'build_dir': 'aarch64', 'extra_cxx_flags': '-fPIC '} Git hash=b'69766d60896e429f27ac094010ae1f30ebbdc630'

This is a prebuilt binary directly downloaded from https://github.com/ARM-software/armnn/releases/tag/v23.11

Platform: Orange PI5+

Operating System: Ubuntu 22.04

Problem description:

For a tflite graph with a Cast operation with input data type int8 to fp16, GPU backend cannot be used with the following warning:

Warning: WARNING: Layer of type Cast is not supported on requested backend GpuAcc for input data type QSymmS8 and output data type Float16 (reason: in validate_arguments src/gpu/cl/kernels/ClCastKernel.cpp:62: src and dst data types must be different), falling back to the next backend.

I'm using Arm Compute Library thru ArmNN with ArmNNTfliteParser. The tflite parser converts from int8 to armnn::DataType::QSymmS8 at
https://github.com/ARM-software/armnn/blob/3ba1ff41e00107ab8b16ef401011a502dc45439b/src/armnnTfLiteParser/TfLiteParser.cpp#L455, and I think this is a correct implementation.

However, the ClCastKernel fails to recognize such data type. I think these two functions should add QSymmS8 into their arguments:

    ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(src, 1, ..., DataType::QSymmS8);
    ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(dst, 1, ..., DataType::QSymmS8);

(https://github.com/ARM-software/ComputeLibrary/blob/main/src/gpu/cl/kernels/ClCastKernel.cpp#L55-L61)

I've tested locally with the change, and the cast now can run at GPU backend which provides much better performance.

Is it possible to add these change in in the next release?

Hi @peiciwu

Thanks for raising this.

For a tflite graph with a Cast operation with input data type int8 to fp16, GPU backend cannot be used with the following warnin

It looks like instead of Cast the layer Dequantize should be called. Could you please share with us the tflite model where you observe this problem?

Hope this helps.

Hi @morgolock,

Thanks for the reply. Here is a tflite model to reproduce: https://mythic.box.com/s/g0ypqs4kfze8w7utzdfqmmwkubxqt11f Note this is not the actual model; I manually created this since the actual model is owned by our company and not allow to share in public. Our model that runs at ArmNN usually casts from int8 (or uint8) to floating point and then let ArmNN/Arm Compute Library perform floating graph operations. This is because that all the 8-bit operations of a quantized model are done in our own HW accelerator. We just do a normal casting from 8-bit (either int8 or unsigned int8) to floating point and vice versa. Hopefully, this explanation helps.

As I mentioned previously, the tflite model is to run with ArmNN and ArmNNTfliteParser (version 23.11). The tflite parser converts int8 to armnn::DataType::QSymmS8 and there is no int8 data type at ArmNN. So at ArmNN level, there is no way to represent a normal int8; I figure that it is a bigger lift to ask ArmNN to support a new data type.

Since the cast function explicitly writes "scale and zeroPoint" are ignored for the quantized types. I'm hoping that DataType::QSYMM8 should be supported as well.

* @note When casting between quantized types the scale and zeroPoint are ignored

/** Casts a given tensor to a new type
 *
 * @note When casting between quantized types the scale and zeroPoint are ignored
 */

Hi @peiciwu

Thanks for the details. I've created a patch to fix the problem, see https://review.mlplatform.org/c/ml/ComputeLibrary/+/11117

Hope this helps

@morgolock Thank you!