Issues
- 0
- 12
Support for loading fp8 checkpoint
#68 opened by wenscarl - 0
Add activation checkpoint offloading
#63 opened by jon-chuang - 0
TransformerLm docs say `start_time_step` should be `prefix_len` but LanguageModel uses `prefix_len-1`
#56 opened by DCtheTall - 3
- 0
- 0
[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running
#46 opened by MoFHeka - 0
[Feature Request] Need ZeRo-1/2 to cooperate with PP+TP+DP. Which may more faster than FSDP sometimes.
#45 opened by MoFHeka - 1
Support custom FP8 dtype in Pipelined Transformer
#44 opened by kaixih - 0
- 2
- 1
Any publicly available document?
#7 opened by Lekja00160612