Samsung/ONE

[onert] Quantization type kernel for transformer

Closed this issue · 2 comments

Below is required I/O quantization type (uint8/uint16) kernel for quantized transformer model

  • MUL
    • UINT8
    • INT16
  • ADD
    • UINT8
    • INT16
  • RSQRT
    • UINT8
    • INT16
  • DIV
    • UINT8
    • INT16
  • RESHAPE (same I/O quant param)
  • TRANSPOSE (same I/O quant param)
    • UINT8
    • INT16
  • STRIDED_SLICE (same I/O quant param)
    • UINT8
    • INT16
  • NEG
    • UINT8
    • INT16
  • CONCATENATION
    • UINT8
    • INT16
  • BATCH_MATMUL
    • UINT8
    • INT16
  • SOFTMAX
    • UINT8
    • INT16
  • LOGISTIC
    • UINT8
    • INT16
  • GATHER (indices: int32/int64)
    • UINT8
    • INT16
  • MEAN
    • UINT8
    • INT16
  • SQRT
    • UINT8
    • INT16

Quantization type change

  • QUANTIZE
    • UINT8 -> INT16
    • INT16 -> UINT8

I/O and weight quantization type for transformer model

  • FULLY_CONNECTED (channelwise quantization)
    • UINT4 weight, UINT8 I/O (#12741)
    • UINT8 I/O and weight
    • INT16 I/O and weight

Updated: QUANTIZE operator

Deprecated issue. We will use hybrid quantization for transformer.