[onert] Quantization type kernel for transformer

Question

Closed this issue 5 days ago · 2 comments

Below is required I/O quantization type (uint8/uint16) kernel for quantized transformer model

Quantization type change

I/O and weight quantization type for transformer model

FULLY_CONNECTED (channelwise quantization)
- UINT4 weight, UINT8 I/O (#12741)
- UINT8 I/O and weight
- INT16 I/O and weight

Answer 1 · 2024-05-07T03:06:14.000Z

Updated: QUANTIZE operator

Answer 2 · 2024-09-25T06:24:46.000Z

Deprecated issue. We will use hybrid quantization for transformer.