mit-han-lab/qserve

activation quantization

hanhanpp opened this issue · 1 comments

I'm confused with the equation (12), what means the outer product of sw and sx? The activation is per-token quantization?

Hi @hanhanpp , thank you for your interests in our work. For your questions, the activation is per-token dynamic quantization, while the weight is per-channel/per-group static quantization.