End-to-end FP8 training
xrsrke opened this issue · 1 comments
xrsrke commented
Notes
- Write an
FP8Tensor
that inherits from torch.Tensor (just support type hints). - Write an
FP8Linear
that binds to TransformerEngine's FP8 kernel in the forward pass
TODO
-
nn.Linear
but in FP8 - Recursively convert all
nn.Linear
to FP8 Linear -
nn.Embedding
in FP8 - DP in FP8
- TP in FP8
- MoE in FP8
- ZeRO-1
- PP in FP8 (you get it for free)