Help wanted for quantized int8 model
Stephenzza opened this issue · 4 comments
Thanks for your fantastic job!
I just quantized the model with INT8, and found that the performance has dropped seriously(map from 0.4 to 0.2). I found maybe the matmul operator caused this problem. Do you have any good suggestions?
Awesome! Hi @Stephenzza, thanks for your interest in SparseInst! Could you specify this operation exactly?
This operation is torch.bmm() in decoder.py line79
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
assume input tensor shape (B,3,512,512), inst_features =(B,400,4096)*(B,4096,256), after this matrix multiplication i got the exploded min max value
I see. I have some suggestions to avoid the overflow:
- normalize
iam_prob
first and then matmul.
normalizer = iam_prob.sum(-1).float().clamp(min=1e-4)
iam_prob = iam_prob / normalizer[:, :, None]
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
- Using a large factor to avoid the overflow:
features = features / 1000
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
inst_features = inst_features / normalizer[:, :, None]
inst_features = inst_features * 1000
I'm glad to hear about the progress and hope my suggestions will work.