xrsrke/pipegoose

End-to-end FP8 training

xrsrke opened this issue · 1 comments

xrsrke commented

Notes

  • Write an FP8Tensor that inherits from torch.Tensor (just support type hints).
  • Write an FP8Linear that binds to TransformerEngine's FP8 kernel in the forward pass

TODO

  • nn.Linear but in FP8
  • Recursively convert all nn.Linear to FP8 Linear
  • nn.Embedding in FP8
  • DP in FP8
  • TP in FP8
  • MoE in FP8
  • ZeRO-1
  • PP in FP8 (you get it for free)