/Pretraining-LLMs

Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectures, execute training runs, and assess model performance for efficient and effective LLM pretraining.

Primary LanguageJupyter Notebook

Welcome to the "Pretraining LLMs" course! πŸ§‘β€πŸ« The course dives into the essential steps of pretraining large language models (LLMs).

πŸ“˜ Course Summary

In this course, you’ll explore pretraining, the foundational step in training LLMs, which involves teaching an LLM to predict the next token using vast text datasets.

🧠 You'll learn the essential steps to pretrain an LLM, understand the associated costs, and discover cost-effective methods by leveraging smaller, existing open-source models.

Detailed Learning Outcomes:

  1. 🧠 Pretraining Basics: Understand the scenarios where pretraining is the optimal choice for model performance. Compare text generation across different versions of the same model to grasp the performance differences between base, fine-tuned, and specialized pre-trained models.
  2. πŸ—ƒοΈ Creating High-Quality Datasets: Learn how to create and clean a high-quality training dataset using web text and existing datasets, and how to package this data for use with the Hugging Face library.
  3. πŸ”§ Model Configuration: Explore ways to configure and initialize a model for training, including modifying Meta’s Llama models and initializing weights either randomly or from other models.
  4. πŸš€ Executing Training Runs: Learn how to configure and execute a training run to train your own model effectively.
  5. πŸ“Š Performance Assessment: Assess your trained model’s performance and explore common evaluation strategies for LLMs, including benchmark tasks used to compare different models’ performance.

πŸ”‘ Key Points

  • 🧩 Pretraining Process: Gain in-depth knowledge of the steps to pretrain an LLM, from data preparation to model configuration and performance assessment.
  • πŸ—οΈ Model Architecture Configuration: Explore various options for configuring your model’s architecture, including modifying Meta’s Llama models and innovative pretraining techniques like Depth Upscaling, which can reduce training costs by up to 70%.
  • πŸ› οΈ Practical Implementation: Learn how to pretrain a model from scratch and continue the pretraining process on your own data using existing pre-trained models.

πŸ‘©β€πŸ« About the Instructors

  • πŸ‘¨β€πŸ« Sung Kim: CEO of Upstage, bringing extensive expertise in LLM pretraining and optimization.
  • πŸ‘©β€πŸ”¬ Lucy Park: Chief Scientific Officer of Upstage, with a deep background in scientific research and LLM development.

πŸ”— To enroll in the course or for further information, visit πŸ“š deeplearning.ai.