Does this framework support "what you serve is what you train" for weight only quantization?
chenho74 opened this issue · 5 comments
Keeping activations as float while quantizing the weights to int8 or int4. I see that currently how it is done is by using float * float dot product. is there a int * float dot product available?
Yes, in the config, you can select it individually.
Thanks for the speedy reply. Would you mind pointing me to the config field that toggles this?
https://github.com/google/aqt/blob/main/aqt/common/aqt_config.py#L362
As you see lhs
and rhs
have completly separate quantization configs.
Yes, but from what i see in the underlying dot product code, if activation is not quantized, a float * float dot product is used? https://github.com/google/aqt/blob/main/aqt/jax/aqt_dot_general.py#L91 Is this a fake quantization or is this also the arithmetic in serving time?