NEGEMMLowpMatrixMultiplyCore: set_pretranspose_A & set_pretranspose_B support
eshoguli opened this issue · 1 comments
Model:
graph TD;
Input1["Input
src1: fp32"]
Quantise1["NEQuantizationLayer
q_src1: QASYMM8_SIGNED"]
Input2["Input
src2: fp32"]
Quantise2["NEQuantizationLayer
q_src2: QASYMM8_SIGNED"]
MatMul["NEGEMMLowpMatrixMultiplyCore
q_res: S8"]
Input1-->Quantise1;
Input2-->Quantise2;
Quantise1-->MatMul;
Quantise2-->MatMul;
MatMul-->Result;
Can you confirm that NEGEMMLowpMatrixMultiplyCore
doesn't support transposed matrix and fix validation in accordance with implementation (experiment 2 below), please?
Experiment 1 (reference): Without transpose of input tensors, everything works as expected:
size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 3;
size_t b2 = 6;
// Allocate input tensors
src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::F32));
src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::F32));
// Allocate & fill matrices
...
// We now have the quantisation info and can configure the quantised tensors
q_src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::QASYMM8_SIGNED, src1_qinfo));
q_src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::QASYMM8_SIGNED, src2_qinfo));
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res);
// Allocate all tensors & run
...
Experiment 2: input tensor dimensions were NOT updated but transpose of B
matrix was added: gemm_info.set_pretranspose_B(true);
.
Expectation: fail, matrix dimensions are not correct.
Result: works as reference - no validation, no fail, the same results as reference.
...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...
Experiment 3: Input tensor dimensions were updated and transpose of B
matrix was added: gemm_info.set_pretranspose_B(true);
.
Expectation: works as reference, the same results as reference.
Result: fail, error message: validation fail: terminating due to uncaught exception of type std::runtime_error: in validate src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp:351: The product AB is defined only if the number of columns in A is equal to the number of rows in B
.
size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 6; // <= updated here: previous value is 3
size_t b2 = 3; // <= updated here: previous value is 6
...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...