cloneofsimo/min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
Python
Issues
- 2
Taking wandb settings from env vars?
#8 opened by yaroslavvb - 2
flash_attention2
#2 opened by ScottishFold007
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
Python