google/praxis

PythonApache-2.0

Issues

Is there an example of training LLM using praxis?
#87 opened a year ago by knightXun
0
Support for loading fp8 checkpoint
#68 opened a year ago by wenscarl
12
Add activation checkpoint offloading
#63 opened 2 years ago by jon-chuang
0
TransformerLm docs say `start_time_step` should be `prefix_len` but LanguageModel uses `prefix_len-1`
#56 opened 2 years ago by DCtheTall
0
Cross-layer attention weight sharing fails in different scopes
#52 opened 2 years ago by mqyqlx
3
Incorrect conversion from tf dtype to jax dtype
#49 opened 2 years ago by backpropper
0
[Feature Request] Need Matmul Attention layer instead of Einsum to support GPU running
#46 opened 2 years ago by MoFHeka
0
[Feature Request] Need ZeRo-1/2 to cooperate with PP+TP+DP. Which may more faster than FSDP sometimes.
#45 opened 2 years ago by MoFHeka
0
Support custom FP8 dtype in Pipelined Transformer
#44 opened 2 years ago by kaixih
1
gpu_fast_attention not passing segment_ids to jax pallas attention mha
#30 opened 2 years ago by Cjkkkk
0
Praxis layers don't support user-specified collection names
#22 opened 2 years ago by kaixih
2
Any publicly available document?
#7 opened 3 years ago by Lekja00160612
1