can you support static per-token activation quantization?
geqian-9192 opened this issue · 1 comments
geqian-9192 commented
can you support static per-token activation quantization, as dynamic quantization is inefficient on hardware?
ys-2020 commented
Hi,
Thanks for your interests in QServe. We fused quantization ops into mem-bounded ops such as layernorm, silu, etc. Thus, the activation quantization overhead is minimal and negligible. Please refer to our paper for more details.