/The-Daily-Train

Training Models Daily

Primary LanguagePythonApache License 2.0Apache-2.0

The Daily Train

Logo of The Daily Train

Objective: To create the best text model possible, trained from scratch on a single h100 (or potentially h200) within a 24-hour period.

The Plan

Daily Training

Every day, we will train the latest code from the main branch on a single h100 GPU. Got an idea? Contribute by submitting a pull request.

Let's explore some stuff first.

Day 1. MMD-GPT - https://wandb.ai/haitch/lightning_logs/reports/MMD-Vae--Vmlldzo2MDU5MjQ2 (training)

commit 29df4d64dd02ffbce17a60e4e5b712bf5fafb437

Focus Areas

  • Exploring architectural improvements in text models.
  • Experiments with small models on a consistent dataset to find general computational efficiencies in architecture.
  • Encouraging novel and unconventional approaches.

What We Aren't Doing

  • Innovating in areas like quantization, efficient fine-tuning, or compute optimizations (e.g., flash attention). Note: We will utilize these technologies.
  • Altering datasets frequently. The initial dataset will remain static, with potential expansions as the project evolves.
  • Training a big model.
  • Following popular trends without critical analysis.

What We Are Doing

  • Testing numerous small models for computational efficiency.
  • Embracing experimental and unconventional ideas.
  • Planning for a monthly training cycle if successful.

Action Point: Join our Discord for discussions and bounties.

Efficiency Metrics

Compute efficiency, defined as the compute required to achieve a set test loss with given design and hyper-parameter choices, is our key metric. More efficient models need fewer GPUs, while less efficient ones need more. By setting a fixed training budget and timeframe, we aim to rapidly identify and explore promising approaches.

How to Contribute

  • Submit a Pull Request: Code > Ideas

Funding

  • Self-Funding: I'll personally fund a dedicated h100 for continuous operation.
  • Community Support: We're open to scaling our best ideas with community support. Contact us if you have available compute resources and are interested in our work.

Acknowledgements

Special thanks to:

License

The Daily Train is released under the Apache 2.0 License.