/zero_to_gpt

Go from no deep learning knowledge to implementing GPT.

Primary LanguageJupyter NotebookOtherNOASSERTION

Zero to GPT

This course will get you from no knowledge of deep learning to training a GPT model. We'll start with the basics, then build up to complex networks.

To use this course, go through each chapter from the beginning. Read the lessons, or watch the optional videos. Then look through the implementations to solidify your understanding. I also recommend implementing each algorithm on your own.

Course Outline

0. Introduction

Get an overview of the course and what we'll learn. Includes some math and NumPy fundamentals you'll need for deep learning.

1. Gradient Descent

Gradient descent is how neural networks train their parameters to match the data. It's the "learning" part of deep learning.

2. Dense networks

Dense networks are the basic form of a neural network, where every input is connected to an output. These can also be called fully connected networks.

3. Classification with neural networks

In the last two lessons, we learned how to perform regression with neural networks. Now, we'll learn how to perform classification.

4. Recurrent networks

Recurrent neural networks can process sequences of data. They're used for time series and natural language processing.

5. Regularization

Regularization prevents overfitting to the training set. This means that the network can generalize well to new data.

  • Lesson: Read the regularization tutorial (coming soon)

6. PyTorch

PyTorch is a framework for deep learning that automates the backward pass of neural networks. This makes it simpler to implement complex networks.

  • Lesson: Read the PyTorch tutorial (coming soon)

7. Data

If you want to train a deep learning model, you need data. Gigabytes of it. We'll discuss how you can get this data and process it.

  • Lesson: Read the data tutorial (coming soon)
  • Implementation: Notebook coming soon

8. Encoders and Decoders

Encoder/decoders are used for NLP tasks when the output isn't the same length as the input. For example, if you want to use questions/answers as training data, the answers may be a different length than the question.

  • Lesson: Read the encoder/decoder tutorial (coming soon)
  • Implementation: Notebook

9. Transformers

Transformers fix the problem of vanishing/exploding gradients in RNNs by using attention. Attention allows the network to process the whole sequence at once, instead of iteratively.

  • Lesson: Read the transformer tutorial (coming soon)
  • Implementation: Notebook

10. Efficient Transformers

GPT models take a long time to train. We can reduce that time by using more GPUs, but we don't all have access to GPU clusters. To reduce training time, we'll incorporate some recent advances to make the transformer model more efficient.

  • Lesson: Read the efficient transformer tutorial (coming soon)
  • Implementation: Notebook

More Chapters Coming Soon

Optional Chapters

Convolutional networks

Convolutional neural networks are used for working with images and time series.

  • Lesson: Read the convolutional network tutorial (coming soon)
  • Implementation: Notebook and class

Gated recurrent networks

Gated recurrent networks help RNNs process long sequences by helping networks forget irrelevant information. LSTM and GRU are two popular types of gated networks.

  • Lesson: Read the GRU tutorial (coming soon)
  • Implementation: Notebook

Installation

If you want to run these notebooks locally, you'll need to install some Python packages.

  • Make sure you have Python 3.8 or higher installed.
  • Clone this repository.
  • Run pip install -r requirements.txt

License

You can use and adapt this material for your own courses, but not commercially. You also must provide attribution.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.