Samsung/ONE

[onert-micro] plan for quantized kernel

Opened this issue · 5 comments

Let's make a plan(by Sep) for quantized(s8/s16) kernel.

AFAIU, the master branch supports :

  • S8 Add
  • S8 AveragePool2D
  • S8 Mul
  • S8 Conv2D
  • S8 MaxPool2D
  • S8 AveragePool2D
  • S8 and S16 FullyConnected

All this operations are accelerated by CMSIS_NN library(https://github.com/ARM-software/CMSIS-NN/tree/v4.1.0?tab=readme-ov-file).

IMHO, Let's fist support all the operations, which are supported by CMSIS_NN. That is, based on CMSIS_NN 4.1(https://github.com/ARM-software/CMSIS-NN/tree/v4.1.0?tab=readme-ov-file), final goal by Sep is that accelerating S8 10 kernels by Sep.
And then, enable S8 kernel for several operations(TBD, ~10 operations) not supported by CMSIS_NN.

@BalyshevArtem Please share any opinion about this

@BalyshevArtem Please share any opinion about this

Yes, sure. Currently we are in process with this task, thank you for detailing the task :)

Then, our final goal by Sep is :

  • 20 operations will support int8 datatype
  • 10 operations will be accelerated by CMSIS_NN

gtest log on x86 about quantized kernel :
quantized_test_xml_log.zip

Note: Google Test filter = *S8*:*S16*
[==========] Running 5 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from AveragePool2DTest
[ RUN      ] AveragePool2DTest.S8_P
[       OK ] AveragePool2DTest.S8_P (0 ms)
[----------] 1 test from AveragePool2DTest (0 ms total)

[----------] 2 tests from FullyConnectedTest
[ RUN      ] FullyConnectedTest.S8_P
[       OK ] FullyConnectedTest.S8_P (0 ms)
[ RUN      ] FullyConnectedTest.S16_P
[       OK ] FullyConnectedTest.S16_P (0 ms)
[----------] 2 tests from FullyConnectedTest (0 ms total)

[----------] 1 test from Conv2DTest
[ RUN      ] Conv2DTest.S8_P
[       OK ] Conv2DTest.S8_P (0 ms)
[----------] 1 test from Conv2DTest (0 ms total)

[----------] 1 test from MaxPool2DTest
[ RUN      ] MaxPool2DTest.S8_P
[       OK ] MaxPool2DTest.S8_P (0 ms)
[----------] 1 test from MaxPool2DTest (0 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 4 test suites ran. (0 ms total)
[  PASSED  ] 5 tests.

log from our target board for testing quantized kernels

START TESTING
-----------------
[ START TEST: Conv2DTest.INT8 ]
[ TEST TIME = (20.000000) us ]
[ TEST Conv2DTest.INT8 RESULT: OK ]
-----------------
[ START TEST: FullyConnectedTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST FullyConnectedTest.S8 RESULT: OK ]
-----------------
[ START TEST: FullyConnectedTest.S16 ]
[ TEST TIME = (20.000000) us ]
[ TEST FullyConnectedTest.S16 RESULT: OK ]
-----------------
[ START TEST: AveragePool2DTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST AveragePool2DTest.S8 RESULT: OK ]
-----------------
[ START TEST: MaxPool2DTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST MaxPool2DTest.S8 RESULT: OK ]
-----------------
END TESTING