google/paxml

Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.

PythonApache-2.0

Issues

Remove tensorflow dependencies
#106 opened 8 months ago by jobs-git
0
Hard dependencies causing conflict with other packages
#105 opened 8 months ago by jobs-git
0
lingvo issue while installing paxml in vscode
#89 opened a year ago by juneedpk
1
How to do gradient accumulation?
#104 opened a year ago by gramesh-amd
0
Question about Integration Timeline for PR #99
#102 opened a year ago by Santiago-Castellano
8
LoRA support in transformers
#96 opened a year ago by tanmayshishodia
1
Error running tutorial notebook
#95 opened a year ago by ajikmr
0
DEADLINE_EXCEEDED on 1024 GPUs.
#77 opened 2 years ago by mhugues
0
Jax + tpu and AQT int8 train model loss is abnormal
#71 opened 2 years ago by Lisennlp
0
Use bfloat16 for eval
#66 opened 2 years ago by tbaker2
1
[Question] Very low MFU(30%~35%) when train bf16 Llama2 and GPT model with single SXM4 A100 machine.
#65 opened 2 years ago by MoFHeka
0
[Feature Request] Need ZeRo-1/2 to cooperate with PP+TP+DP. Which may more faster than FSDP sometimes.
#64 opened 2 years ago by MoFHeka
0
Remaining tutorials
#58 opened 2 years ago by rahulbatra85
0
How to continue training from a checkpoint?
#37 opened 2 years ago by lkm2835
0
Int8 checkpoint
#35 opened 2 years ago by wx-x
0
ARM64 Build
#30 opened 2 years ago by joker-eph
4
Installing paxml from source failed due to dependency problem
#25 opened 2 years ago by yhtang
5
Pipeline Parallelism: USE_REPEATED_LAYERS bug
#5 opened 3 years ago by abhinavgoel95
3
Pipeline Parallelism: F external/org_tensorflow/tensorflow/compiler/xla/array.h:446] Check failed: n < sizes_size Fatal Python error: Aborted
#4 opened 3 years ago by abhinavgoel95
3
Error running Common Crawl example
#11 opened 3 years ago by RobertLiJN
2
ERROR: error loading package 'paxml'
#1 opened 3 years ago by sharathts
7
Unexpected Overheads with Activation Checkpointing with Pipeline Parallelism
#17 opened 3 years ago by abhinavgoel95
1
What does USE_REPEATED_LAYER do?
#3 opened 3 years ago by abhinavgoel95
1