Objective: To create the best text model possible, trained from scratch on a single h100 (or potentially h200) within a 24-hour period.
Every day, we will train the latest code from the main branch on a single h100 GPU. Got an idea? Contribute by submitting a pull request.
Let's explore some stuff first.
Day 1. MMD-GPT - https://wandb.ai/haitch/lightning_logs/reports/MMD-Vae--Vmlldzo2MDU5MjQ2 (training)
commit 29df4d64dd02ffbce17a60e4e5b712bf5fafb437
- Exploring architectural improvements in text models.
- Experiments with small models on a consistent dataset to find general computational efficiencies in architecture.
- Encouraging novel and unconventional approaches.
- Innovating in areas like quantization, efficient fine-tuning, or compute optimizations (e.g., flash attention). Note: We will utilize these technologies.
- Altering datasets frequently. The initial dataset will remain static, with potential expansions as the project evolves.
- Training a big model.
- Following popular trends without critical analysis.
- Testing numerous small models for computational efficiency.
- Embracing experimental and unconventional ideas.
- Planning for a monthly training cycle if successful.
Action Point: Join our Discord for discussions and bounties.
Compute efficiency, defined as the compute required to achieve a set test loss with given design and hyper-parameter choices, is our key metric. More efficient models need fewer GPUs, while less efficient ones need more. By setting a fixed training budget and timeframe, we aim to rapidly identify and explore promising approaches.
- Submit a Pull Request: Code > Ideas
- Self-Funding: I'll personally fund a dedicated h100 for continuous operation.
- Community Support: We're open to scaling our best ideas with community support. Contact us if you have available compute resources and are interested in our work.
Special thanks to:
- @Lightning-AI for Lit-GPT
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX and the Evaluation Harness
- @TimDettmers for bitsandbytes
- @IST-DASLab for GPTQ
- @Microsoft for LoRA
- @tridao for Flash Attention 2
The Daily Train is released under the Apache 2.0 License.