hustvl/SparseInst

Help wanted for quantized int8 model

Stephenzza opened this issue · 4 comments

Thanks for your fantastic job!
I just quantized the model with INT8, and found that the performance has dropped seriously(map from 0.4 to 0.2). I found maybe the matmul operator caused this problem. Do you have any good suggestions?

image
I found after quantization the matmul operator max value is too big, which 0~255 can not represent all value.

Awesome! Hi @Stephenzza, thanks for your interest in SparseInst! Could you specify this operation exactly?

This operation is torch.bmm() in decoder.py line79
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
assume input tensor shape (B,3,512,512), inst_features =(B,400,4096)*(B,4096,256), after this matrix multiplication i got the exploded min max value

I see. I have some suggestions to avoid the overflow:

  1. normalize iam_prob first and then matmul.
normalizer = iam_prob.sum(-1).float().clamp(min=1e-4)
iam_prob = iam_prob / normalizer[:, :, None]
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
  1. Using a large factor to avoid the overflow:
features = features / 1000
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
inst_features = inst_features / normalizer[:, :, None]
inst_features = inst_features * 1000

I'm glad to hear about the progress and hope my suggestions will work.