float8 upcoming feature tracker

vkuzo opened this issue 5 months ago · 0 comments

vkuzo commented 5 months ago

configurability

[planned] support rowwise/blockwise scaling granularity, configurable separately for each gemm
[planned] configure settings for each of the three gemms in linear fwd/bwd separately
[planned] support more fine grained configuration of how to apply Float8Linear to individual modules
[planned] inference support (see pytorch-labs/float8_experimental#314)

performance

[in progress] torch._scaled_mm support for rowwise scaled float8 gemm
- [done] eager mode support
- [planned] torch.compile support, backed by triton/cutlass
[in progress] optimize torch.compile performance for float8 scaling/casting kernels
- [fixed behind a flag, off by default] pytorch/pytorch#130015
- [planned] pytorch/pytorch#133242
- [planned] pytorch/pytorch#128063
- [planned] pytorch/pytorch#136267

distributed

[in progress] integrate with FSDP2 with 16-bit or 8-bit all-gather with delayed scaling for weights
- POC is done, performance optimizations are ongoing
[planned] verify integration with PP

other

weight gradient accumulation in float32
add use_fast_accum (float8 accumulation of gemm) option to UX - pytorch-labs/float8_experimental#144
improve saturated casting performance

copied from pytorch-labs/float8_experimental#187