AO dtype composability tracker

Question

AO dtype composability tracker

Opened this issue 23 days ago · 1 comments

As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding more columns or rows or if there's any cells you disagree with please let me know!

The columns can also compose with each other but to be explicit

training with FSDP2 should compose with low bit optimizers
Inference quantization and KV cache quantization should compose

And sparsity IIUC only works with in8 inference quantization right now

Dtype	Training with FSDP2	Inference	Optimizer	QAT	KV cache	Notes
Int8	Experimental	Yes	Yes	LUT based	Yes
Int4	No	Yes	Yes	LUT based	No
Fp8	Yes	Yes	Yes	Not needed	No
NF4	Yes	Experimental	No	In progress	No	Does not use quantize api
fp6	No	Yes	No	No	No
UintX/Fpx	In progress	Yes	No	No	No	Still requires more performance work
MX: fp8/6/4 with scales	Emulation only	Emulation only	No	Not needed because we can compute in this dtype	No	Pending release of B100 gpus for acceleration
Autoquant	N/A	Yes	N/A	N/A	N/A	Supports int8/4. Fp8 coming next

TODO

Seperate table where columns are weights, activation, optimizer and gradients
Seperate table where techniques are rows and columns are devices

Answer 1 · 2024-09-09T00:50:30.000Z

Small correction. 8-bit and 4-bit optimizers are not exactly INT8 and INT4. They are LUT-based quantization, where the LUT values are defined by Timm Dettmer's "dynamic tree quantization" scheme. (to be even more specific, the 2nd buffer of INT4 optimizer actually uses affine quantization).