[q-implant] Quantization parameter importer
Closed this issue · 4 comments
What
Let's develop q-implant
, a tool to import quantization parameters generated by external toolchains (ex: PyTorch) to a circle model.
Why
- ONE has its own quantization tool, but it has limitations in applying latest quantization techniques implemented in various frameworks. Especially, QAT (Quantization-Aware Training) or advanced PTQ (Post-Training Quantization) techniques usually require a significant amount of GPU resources, which is difficult to integrate with ONE.
- We will implement
q-implant
, which can import quantization parameters gathered by an external quantization framework (ex: PyTorch) to a circle model. - Expected benefits
- The latest quantization techniques implemented in various frameworks can be applied to the circle model.
- Circle model will have improved accuracy and performance.
Tasks
- Define input format (quantization parameters + weights of the model).
- Implement
q-implant
. - Test
q-implant
.
Usage
Example (Assumed input format is json (param.json
))
q-implant --input input.circle --param param.json --output output.circle
The above command will generate a quantized circle model (output.circle
) from an fp32 model (input.circle
) using param.json
.
Let me ask a (dumb) question. 😅
Can we expect to merge this functionality as a additional option of onecc quantizer ?
Can we expect to merge this functionality as a additional option of onecc quantizer ?
Very good question :) I think q-implant will be an option of onecc quantize
. I can't think of another case for now.
Input format
I'm thinking .json
+ .npy
files (qparam) as an input to q-implant
. The json file will describe qparam info, and npy files will have the contents.
For example, let's assume a torch model with a single Op (Conv2d).
Torch code (permute
is inserted here and there to support NHWC input/output)
import torch
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.ker = torch.tensor([[[[0.311261,0.920864]]]], dtype = torch.float32)
self.bias = torch.tensor([0.502937], dtype = torch.float32)
self.layer0 = nn.Conv2d(2, 1, (1,1), stride = (1,1), dilation = (1,1))
self.layer0.weight = nn.Parameter(self.ker.permute(0,3,1,2))
self.layer0.bias = nn.Parameter(self.bias)
def forward(self, ifm):
ofm = self.layer0(ifm.permute(0,3,1,2))
return ofm.permute(0,2,3,1)
Model
has a single layer (Conv2d), and there are four tensors with float32 dtype.
ifm
input of Conv2dker
kernel of Conv2dbias
bias of Conv2dofm
: output of Conv2d
After quantization, the four tensors will have less-precision dtype (ex: uint8) and have quantization parameters. In this case, param.json
would describe those information as below.
{
"ifm": {
"dtype": "uint8",
"scale": "0.npy", # dtype: fp32, shape: [1]
"zerop": "1.npy", # dtype: int64, shape: [1]
"quantized_dimension": 0
},
"ker": {
"dtype": "uint8",
"scale": "2.npy", # dtype: fp32, shape: [1] (=output channel)
"zerop": "3.npy", # dtype: int64, shape: [1] (=output channel)
"value": "4.npy", # dtype: uint8, shape: [1, 1, 1, 2] (OHWI. compatible with NHWC format)
"quantized_dimension": 0
},
"bias": {
"dtype": "int32",
"scale": "5.npy", # dtype: fp32, shape: [1] (=output channel)
"zerop": "6.npy", # dtype: int64, shape: [1] (=output channel)
"value": "7.npy", # dtype: int32, shape: [1] (=output channel)
"quantized_dimension": 0
},
"ofm": {
"dtype": "uint8",
"scale": "8.npy", # dtype: fp32, shape: [1]
"zerop": "9.npy", # dtype: int64, shape: [1]
"quantized_dimension": 0
}
}
+ 0~9.npy files
- Reason for using json + npy: Easy to use in cpp
- Reason for using npy files, instead of writing values in json file: It is inefficient to save data vectors in a string format
Done.