Issues
- 0
How to do gradient accumulation?
#104 opened by gramesh-amd - 8
- 1
LoRA support in transformers
#96 opened by tanmayshishodia - 0
Error running tutorial notebook
#95 opened by ajikmr - 0
lingvo issue while installing paxml in vscode
#89 opened by juneedpk - 0
DEADLINE_EXCEEDED on 1024 GPUs.
#77 opened by mhugues - 0
- 1
Use bfloat16 for eval
#66 opened by tbaker2 - 0
[Question] Very low MFU(30%~35%) when train bf16 Llama2 and GPT model with single SXM4 A100 machine.
#65 opened by MoFHeka - 0
[Feature Request] Need ZeRo-1/2 to cooperate with PP+TP+DP. Which may more faster than FSDP sometimes.
#64 opened by MoFHeka - 0
Remaining tutorials
#58 opened by rahulbatra85 - 0
How to continue training from a checkpoint?
#37 opened by lkm2835 - 0
Int8 checkpoint
#35 opened by wx-x - 4
ARM64 Build
#30 opened by joker-eph - 5
- 3
- 3
Pipeline Parallelism: F external/org_tensorflow/tensorflow/compiler/xla/array.h:446] Check failed: n < sizes_size Fatal Python error: Aborted
#4 opened by abhinavgoel95 - 2
Error running Common Crawl example
#11 opened by RobertLiJN - 7
ERROR: error loading package 'paxml'
#1 opened by sharathts - 1
Unexpected Overheads with Activation Checkpointing with Pipeline Parallelism
#17 opened by abhinavgoel95 - 1
What does USE_REPEATED_LAYER do?
#3 opened by abhinavgoel95