/nanoGPT

The simple repository for training/finetuning medium-sized GPTs.

Primary LanguageJupyter NotebookMIT LicenseMIT

nanoGPT

The simple repository for training/finetuning medium-sized GPTs. This is one attempt to build a version of training and finetuning nano-GPT model. So, the popular ChatGPT is for your information introduced below as mentionaed in the OpenAI website.

  • The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
  • ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

What's the catch?

This whole notebook is based on the vital research paper : Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

A message to the reader:

  • The architecture of the model followed is a Multi-Head Attention, but with only "Self-Attention" Layers, without the use of Cross Attention from the encoder.

  • The paper followed the task of translation, thereby requiring the process of encoder and decoder.

  • Here, only the next predictions i.e. decoding layers are concerned, therefore, an extention of the Bigram Language Model from makemore. Check out that as well.

Methods

  • We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.

  • To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

Install

Dependencies:

  • pytorch <3
  • numpy <3
  • pip install transformers for huggingface transformers <3 (to load GPT-2 checkpoints)
  • pip install datasets for huggingface datasets <3 (if you want to download + preprocess OpenWebText)
  • pip install tiktoken for OpenAI's fast BPE code <3
  • pip install wandb for optional logging <3
  • pip install tqdm

Notebooks and codes:

notebooks/gpt-dev.ipynb
codes/bigram.py
codes/v2.py

Dataset

data/input.txt

Result Sample

This generates a few samples, for example:

ANGELO:
And cowards it be strawn to my bed,
And thrust the gates of my threats,
Because he that ale away, and hang'd
An one with him.
DUKE VINCENTIO:
I thank your eyes against it.
DUKE VINCENTIO:
Then will answer him to save the malm:
And what have you tyrannous shall do this?
DUKE VINCENTIO:
If you have done evils of all disposition
To end his power, the day of thrust for a common men
That I leave, to fight with over-liking
Hasting in a roseman.

References