Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64
fanchenkong1 opened this issue · 0 comments
fanchenkong1 commented
V8 now supports AVX-VNNI instructions. The i32x4.dot_i8x16_i7x16_adds can be compiled to vpdpbusd on x64 devices, which increase the speed of applications using this opcode.
XNNPACK already has QC8/QS8 GEMM/IGEMM microkernels using relaxed simd dot products. But they are limited to certain implementation of i32x4.dot_i8x16_i7x16_adds (CheckWAsmSDOT). We would also need microkernels for VNNI-style i32x4.dot_i8x16_i7x16_adds. Our performance test using vpdpbusd on end2end_bench with a PoC show large improvement in following cases.
d8/end2end_bench | Reduction on execution time% |
---|---|
QC8MobileNetV1/T:1/real_time | -45.60% |
QC8MobileNetV2/T:1/real_time | -30.50% |
QS8MobileNetV1/T:1/real_time | -45.40% |
QS8MobileNetV2/T:1/real_time | -30.30% |
Does XNNPACK have plan on adding new microkernels for VNNI implementation of Wasm relaxed integer dot product? We can provide patch if needed.