AssertionError
Closed this issue · 1 comments
I attempted the W4A4 operation on the OPT-350M model and was able to obtain the corresponding results. However, after switching the model to 2.7B, I encountered a mismatch error at line 238 in quant.py. Upon printing, I discovered the size to be ([32, 2048, 160]), whereas, for the 350M model, it was displayed as 16, 2048, 128. How should I resolve this error?
Hi @muzi0111,
Thanks for your interest in our project.
About the assertion error, I'm assuming you are referring to L235 in quant.py. The quantization method applied to KV-Cache is group quant with the granularity of per head. Therefore, this assertion is to ensure the last dimension (which will be the reduction dimension in group quant) has the shape of head_dim. 128 here is widely used head_dim in newly released models for efficiency consideration.
To resolve this error, I think replacing 128 with head_dim will be a good choice.