Does this framework support "what you serve is what you train" for weight only quantization?

Question

Does this framework support "what you serve is what you train" for weight only quantization?

chenho74 opened this issue a year ago · 5 comments

Keeping activations as float while quantizing the weights to int8 or int4. I see that currently how it is done is by using float * float dot product. is there a int * float dot product available?

Answer 1 · 2023-06-05T16:54:00.000Z

Yes, in the config, you can select it individually.

Answer 2 · 2023-06-05T18:21:50.000Z

Thanks for the speedy reply. Would you mind pointing me to the config field that toggles this?

Answer 3 · 2023-06-05T20:22:26.000Z

https://github.com/google/aqt/blob/main/aqt/common/aqt_config.py#L362

As you see lhs and rhs have completly separate quantization configs.

Answer 4 · 2023-06-05T21:38:58.000Z

Yes, but from what i see in the underlying dot product code, if activation is not quantized, a float * float dot product is used? https://github.com/google/aqt/blob/main/aqt/jax/aqt_dot_general.py#L91 Is this a fake quantization or is this also the arithmetic in serving time?