NEGEMMLowpMatrixMultiplyCore: set_pretranspose_A & set_pretranspose_B support

Question

NEGEMMLowpMatrixMultiplyCore: set_pretranspose_A & set_pretranspose_B support

eshoguli opened this issue 4 months ago · 1 comments

Model:

graph TD;
    Input1["Input
    src1: fp32"]
    Quantise1["NEQuantizationLayer
    q_src1: QASYMM8_SIGNED"]
    Input2["Input
    src2: fp32"]
    Quantise2["NEQuantizationLayer
    q_src2: QASYMM8_SIGNED"]
    MatMul["NEGEMMLowpMatrixMultiplyCore
    q_res: S8"]

    Input1-->Quantise1;
    Input2-->Quantise2;
    Quantise1-->MatMul;
    Quantise2-->MatMul;
    MatMul-->Result;

Can you confirm that NEGEMMLowpMatrixMultiplyCore doesn't support transposed matrix and fix validation in accordance with implementation (experiment 2 below), please?

Experiment 1 (reference): Without transpose of input tensors, everything works as expected:

size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 3;
size_t b2 = 6;

// Allocate input tensors
src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::F32));
src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::F32));

// Allocate & fill matrices
...

// We now have the quantisation info and can configure the quantised tensors
q_src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::QASYMM8_SIGNED, src1_qinfo));
q_src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::QASYMM8_SIGNED, src2_qinfo));

// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res);

// Allocate all tensors & run
...

Experiment 2: input tensor dimensions were NOT updated but transpose of B matrix was added: gemm_info.set_pretranspose_B(true);.
Expectation: fail, matrix dimensions are not correct.
Result: works as reference - no validation, no fail, the same results as reference.

...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...

Experiment 3: Input tensor dimensions were updated and transpose of B matrix was added: gemm_info.set_pretranspose_B(true);.
Expectation: works as reference, the same results as reference.
Result: fail, error message: validation fail: terminating due to uncaught exception of type std::runtime_error: in validate src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp:351: The product AB is defined only if the number of columns in A is equal to the number of rows in B.

size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 6; // <= updated here: previous value is 3
size_t b2 = 3; // <= updated here: previous value is 6
...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...