ARM-software/ComputeLibrary

How to use fixed format kernel?

Closed this issue · 2 comments

I have seen the use of fixed format kernel in the implementation of onednn, but I don’t understand how the weight tensor info is processed, such as the processing of stride. Is there any example code in C++?

In addition, I observed that the fixed format kernel is slower than the ordinary hybrid kernel. Is this true?

The weights are ordered into the memory format expected by the asm kernel in ACL ahead of being passed into ACL. In a 'non fixed format' build, the weights are re-ordered inside Compute Library to match the format that the chosen kernel expects. The "fixed format" nomenclature comes from the fact that this build is using a collection of GEMM kernels with a common (i.e. fixed) weights format.

This is potentially less performant than having an optimised format for each kernel, however, it allows the responsibility for getting the memory format expected by the kernel to be hoisted out of Compute Library and into oneDNN & TensorFlow. This is essential when dealing with cached oneDNN primitives (which TensorFlow uses) where the wei tensor of a cached primitive can get re-written. Without the fixed format kernels exposed in ACL and integrated into oneDNN, these re-written weights will not get ingested by the GEMM kernels which will use the original weights (re-ordered into the required memory format) leading to incorrect results. In this context, the ability to use primitive caching in TensorFlow, via oneDNN, outweighs the performance penalty of relying on these fixed format kernels.

oneapi-src/oneDNN#1590

tensorflow/tensorflow#57987