[onert] Quantization type kernel for transformer
Closed this issue · 2 comments
hseok-oh commented
Below is required I/O quantization type (uint8/uint16) kernel for quantized transformer model
- MUL
- UINT8
- INT16
- ADD
- UINT8
- INT16
- RSQRT
- UINT8
- INT16
- DIV
- UINT8
- INT16
- RESHAPE (same I/O quant param)
- TRANSPOSE (same I/O quant param)
- UINT8
- INT16
- STRIDED_SLICE (same I/O quant param)
- UINT8
- INT16
- NEG
- UINT8
- INT16
- CONCATENATION
- UINT8
- INT16
- BATCH_MATMUL
- UINT8
- INT16
- SOFTMAX
- UINT8
- INT16
- LOGISTIC
- UINT8
- INT16
- GATHER (indices: int32/int64)
- UINT8
- INT16
- MEAN
- UINT8
- INT16
- SQRT
- UINT8
- INT16
Quantization type change
- QUANTIZE
- UINT8 -> INT16
- INT16 -> UINT8
I/O and weight quantization type for transformer model
- FULLY_CONNECTED (channelwise quantization)
- UINT4 weight, UINT8 I/O (#12741)
- UINT8 I/O and weight
- INT16 I/O and weight
hseok-oh commented
Updated: QUANTIZE
operator
hseok-oh commented
Deprecated issue. We will use hybrid quantization for transformer.