hanhanpp opened this issue 10 months ago · 1 comments
I'm confused with the equation (12), what means the outer product of sw and sx? The activation is per-token quantization?
Hi @hanhanpp , thank you for your interests in our work. For your questions, the activation is per-token dynamic quantization, while the weight is per-channel/per-group static quantization.