google/XNNPACK

Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64

fanchenkong1 opened this issue · 0 comments

V8 now supports AVX-VNNI instructions. The i32x4.dot_i8x16_i7x16_adds can be compiled to vpdpbusd on x64 devices, which increase the speed of applications using this opcode.

XNNPACK already has QC8/QS8 GEMM/IGEMM microkernels using relaxed simd dot products. But they are limited to certain implementation of i32x4.dot_i8x16_i7x16_adds (CheckWAsmSDOT). We would also need microkernels for VNNI-style i32x4.dot_i8x16_i7x16_adds. Our performance test using vpdpbusd on end2end_bench with a PoC show large improvement in following cases.

d8/end2end_bench Reduction on execution time%
QC8MobileNetV1/T:1/real_time -45.60%
QC8MobileNetV2/T:1/real_time -30.50%
QS8MobileNetV1/T:1/real_time -45.40%
QS8MobileNetV2/T:1/real_time -30.30%

Does XNNPACK have plan on adding new microkernels for VNNI implementation of Wasm relaxed integer dot product? We can provide patch if needed.