Help wanted for quantized int8 model

Question

Help wanted for quantized int8 model

Stephenzza opened this issue 2 years ago · 4 comments

Thanks for your fantastic job!
I just quantized the model with INT8, and found that the performance has dropped seriously(map from 0.4 to 0.2). I found maybe the matmul operator caused this problem. Do you have any good suggestions?

Answer 1 · 2022-10-20T06:37:42.000Z

I found after quantization the matmul operator max value is too big, which 0~255 can not represent all value.

Answer 2 · 2022-10-27T02:36:45.000Z

Awesome! Hi @Stephenzza, thanks for your interest in SparseInst! Could you specify this operation exactly?

Answer 3 · 2022-10-27T06:26:13.000Z

This operation is torch.bmm() in decoder.py line79
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
assume input tensor shape (B,3,512,512), inst_features =(B,400,4096)*(B,4096,256), after this matrix multiplication i got the exploded min max value

Answer 4 · 2022-10-28T02:07:51.000Z

I see. I have some suggestions to avoid the overflow:

normalize iam_prob first and then matmul.

normalizer = iam_prob.sum(-1).float().clamp(min=1e-4)
iam_prob = iam_prob / normalizer[:, :, None]
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))

Using a large factor to avoid the overflow:

features = features / 1000
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
inst_features = inst_features / normalizer[:, :, None]
inst_features = inst_features * 1000

I'm glad to hear about the progress and hope my suggestions will work.