How did you quantize the ONNX model?

Question

How did you quantize the ONNX model?

Opened this issue a year ago · 3 comments

Whenever I try to export a quantized ONNX model, it doesn't work in the web app. When I use your sam_quantized.onnx file, everything works great. I've tried running the model export on my Mac M1 and a Windows machine, and I get the same result regardless. The web app just says "Error: Can't create a sesson".

Thanks for any help you can provide. My goal is to be able to use a quantized vit_b model.

Edit to add that I get a lot of Ignore MatMul due to non constant B messages when the model is being quantized. I can't figure out if that's an issue or not.

Answer 1 · 2023-06-28T11:23:13.000Z

@black-spire
This is the script I'm using to export the model:
python scripts/export_onnx.py --checkpoint {ckp path} --model-type {model type} --output {onnx file path} --return-single-mask --quantize-out --output {quantized onnx file path}

.Could you please provide detailed steps to reproduce the problem you encountered?

Answer 2 · 2023-06-28T19:03:50.000Z

That's the same script I'm using, with the exception of changing --quantize-out --output {quantized onnx file path} to --quantize-out {quantized onnx file path}, because the script in your comment throws an error (--quantize-out is only expecting 1 argument).

python server/export_onnx.py --checkpoint server/model/sam_vit_h_4b8939.pth --model-type vit_h --output server/model/sam_h.onnx --return-single-mask --quantize-out server/model/sam_h_quantized.onnx

When I take the quantized file I get from that, put it in the public folder of the frontend, and change line 108 in App.tsx to point to that new file, an error is thrown when the inference session is created on line 112 of App.tsx

But using your quantized file that is packaged with the repo, everything works ok. I can't figure out where I'm going wrong....

Answer 3 · 2023-06-30T16:18:49.000Z

Your script seems fine, and I re-exported it using my Mac computer with an Intel chip, and it worked fine. During the export process, the following log output was generated:

`
% python3 server/export_onnx.py --checkpoint server/model/sam_vit_h_4b8939.pth --model-type vit_h --output server/model/sam_h.onnx --return-single-mask --quantize-out server/model/sam_h_quantized.onnx
（some warnings）
Loading model...
Exporting onnx model to server/model/sam_h.onnx...
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Model has successfully been run with ONNXRuntime.
Quantizing model and writing to server/model/sam_h_quantized.onnx...
Ignore MatMul due to non constant B: /[/transformer/layers.0/self_attn/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.0/self_attn/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/layers.0/cross_attn_token_to_image/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.0/cross_attn_token_to_image/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/layers.0/cross_attn_image_to_token/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.0/cross_attn_image_to_token/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/layers.1/self_attn/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.1/self_attn/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/layers.1/cross_attn_token_to_image/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.1/cross_attn_token_to_image/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/layers.1/cross_attn_image_to_token/MatMul]
Ignore MatMul due to non constant B: /[/transformer/layers.1/cross_attn_image_to_token/MatMul_1]
Ignore MatMul due to non constant B: /[/transformer/final_attn_token_to_image/MatMul]
Ignore MatMul due to non constant B: /[/transformer/final_attn_token_to_image/MatMul_1]
Ignore MatMul due to non constant B: /[/MatMul_1]
Done!
`

The software versions of my Python environment are as follows:
python=3.9.0
torch=3.0.0
onnx=1.14.0

Later, I will try it again on an M1 computer to see if it works.