Samsung/ONE

[q-implant] Quantization parameter importer

Closed this issue · 4 comments

What

Let's develop q-implant, a tool to import quantization parameters generated by external toolchains (ex: PyTorch) to a circle model.

Why

  • ONE has its own quantization tool, but it has limitations in applying latest quantization techniques implemented in various frameworks. Especially, QAT (Quantization-Aware Training) or advanced PTQ (Post-Training Quantization) techniques usually require a significant amount of GPU resources, which is difficult to integrate with ONE.
  • We will implement q-implant, which can import quantization parameters gathered by an external quantization framework (ex: PyTorch) to a circle model.
  • Expected benefits
    • The latest quantization techniques implemented in various frameworks can be applied to the circle model.
    • Circle model will have improved accuracy and performance.

Tasks

  • Define input format (quantization parameters + weights of the model).
  • Implement q-implant.
  • Test q-implant.

Usage

Example (Assumed input format is json (param.json))

q-implant --input input.circle --param param.json --output output.circle

The above command will generate a quantized circle model (output.circle) from an fp32 model (input.circle) using param.json.

Let me ask a (dumb) question. 😅
Can we expect to merge this functionality as a additional option of onecc quantizer ?

Can we expect to merge this functionality as a additional option of onecc quantizer ?

Very good question :) I think q-implant will be an option of onecc quantize. I can't think of another case for now.

Input format

I'm thinking .json + .npy files (qparam) as an input to q-implant. The json file will describe qparam info, and npy files will have the contents.

For example, let's assume a torch model with a single Op (Conv2d).

Torch code (permute is inserted here and there to support NHWC input/output)

import torch
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.ker = torch.tensor([[[[0.311261,0.920864]]]], dtype = torch.float32)
        self.bias = torch.tensor([0.502937], dtype = torch.float32)
        self.layer0 = nn.Conv2d(2, 1, (1,1), stride = (1,1), dilation = (1,1))
        self.layer0.weight = nn.Parameter(self.ker.permute(0,3,1,2))
        self.layer0.bias = nn.Parameter(self.bias)
    def forward(self, ifm):
        ofm = self.layer0(ifm.permute(0,3,1,2))
        return ofm.permute(0,2,3,1)

Model has a single layer (Conv2d), and there are four tensors with float32 dtype.

  • ifm input of Conv2d
  • ker kernel of Conv2d
  • bias bias of Conv2d
  • ofm: output of Conv2d

After quantization, the four tensors will have less-precision dtype (ex: uint8) and have quantization parameters. In this case, param.json would describe those information as below.

{
  "ifm": {
    "dtype": "uint8",
    "scale": "0.npy", # dtype: fp32, shape: [1]
    "zerop": "1.npy", # dtype: int64, shape: [1]
    "quantized_dimension": 0
  },
  "ker": {
    "dtype": "uint8",
    "scale": "2.npy", # dtype: fp32, shape: [1] (=output channel)
    "zerop": "3.npy", # dtype: int64, shape: [1] (=output channel)
    "value": "4.npy", # dtype: uint8, shape: [1, 1, 1, 2] (OHWI. compatible with NHWC format)
    "quantized_dimension": 0
  },
  "bias": {
    "dtype": "int32",
    "scale": "5.npy", # dtype: fp32, shape: [1] (=output channel)
    "zerop": "6.npy", # dtype: int64, shape: [1] (=output channel)
    "value": "7.npy", # dtype: int32, shape: [1] (=output channel)
    "quantized_dimension": 0
  },
  "ofm": {
    "dtype": "uint8",
    "scale": "8.npy", # dtype: fp32, shape: [1]
    "zerop": "9.npy", # dtype: int64, shape: [1]
    "quantized_dimension": 0
  }
}

+ 0~9.npy files

  • Reason for using json + npy: Easy to use in cpp
  • Reason for using npy files, instead of writing values in json file: It is inefficient to save data vectors in a string format

Done.