/Quantization-in-Depth

Dive into advanced quantization techniques. Learn to implement and customize linear quantization functions, measure quantization error, and compress model weights using PyTorch for efficient and accessible AI models.

Primary LanguageJupyter Notebook

💡 Welcome to the "Quantization in Depth" course! This course delves into advanced quantization techniques to compress and optimize models, making them more accessible and efficient.

Course Summary

In this course, you'll explore in-depth quantization methods to reduce model weights and maintain performance. Here's what you can expect to learn and experience:

  1. ⚙️ Linear Quantization: Build and customize linear quantization functions, exploring modes (asymmetric and symmetric) and granularities (per-tensor, per-channel, and per-group).
  2. 📏 Quantization Error Measurement: Measure the quantization error of different options, balancing performance and space trade-offs.
  3. 🛠️ PyTorch Quantizer: Implement a general-purpose quantizer in PyTorch to compress model weights from 32 bits to 8 bits.
  4. 🧩 Advanced Techniques: Pack four 2-bit weights into one 8-bit integer, going beyond standard 8-bit quantization.

Key Points

  • 🔄 Explore different variants of Linear Quantization, including symmetric vs. asymmetric modes and various granularities.
  • 🧠 Build a general-purpose quantizer in PyTorch for up to 4x compression on dense layers of any open-source model.
  • 📦 Implement weight packing to compress four 2-bit weights into a single 8-bit integer.

About the Instructors

🌟 Marc Sun and Younes Belkada are Machine Learning Engineers at Hugging Face, bringing extensive expertise in model compression and optimization to guide you through this advanced course.

🔗 To enroll in the course or for further information, visit deeplearning.ai.