Issues
- 4
Timeout after saving step 5000 checkpoints
#8 opened by zengqg - 4
Why the program gets stuck here?
#1 opened by zjq0455 - 2
could you provide opt1.3b.sh?
#9 opened by zxbjushuai - 1
- 5
2nd stage training (running deepspeed training for scripts/llama2_7b.sh) got stuck at step5000, is this the expected behavior?
#6 opened by ericwudocomoi - 2
reproduce your code
#5 opened by hsb1995 - 2
Do you any plans to support Qwen2 72B?
#4 opened by lihuibng - 3
About GPU OOM
#3 opened by ericwudocomoi - 1
Error in installation instructions
#2 opened by dveloper77