/GPT-2

Primary LanguageJupyter Notebook

GPT-2 from Scratch and Backend Service on MAC

This repository contains the implementation of GPT-2 built from scratch. It also runs a backend service with a single endpoint which accepts user input with a prefix sentence and returns a complete sentence by calling the trained GPT-2 model.

Getting Started

Prerequisites

  • Python 3.10 or higher
  • PyTorch
  • Other dependencies as listed in requirements.txt

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
  2. Install the dependencies:

    pip install -r requirements.txt
  3. Run the file:

    • To train a GPT-2 model:
      python main.py
    • To start a backend server on your local computer:
      python backend_server.py

Running in Docker

You can also run the project in a Docker container:

  1. Pull the Docker base image:

    docker pull robd003/python3.10:latest
  2. Build the Docker image:

    sudo docker build -t gpt-2 -f Dockerfile .
  3. Run a Docker container:

    • For training:
      sudo docker run -p 5001:5001 -it --name gpt-2 -v $(pwd)/log:/app/log gpt-2 training /bin/bash
    • For service:
      sudo docker run -p 5001:5001 -it --name gpt-2 -v $(pwd)/log:/app/log gpt-2 service /bin/bash

    This command creates a Docker container which maps the container's port 5001 to your local host's port 5001. The container is named gpt-2 and created using the built Docker image gpt-2. It also mounts your local log directory to the container's /app/log directory, and opens a bash shell in the container.

Dataset

The dataset used for training is the Tiny Shakespeare dataset. The dataset is stored in the input.txt file.

Logs

Training logs can be found in the log.txt file under the log folder. The following figure shows the training and validation loss changes with the number of steps:

Training and validation loss

Testing loss: 5.865208148956299

Backend Service

The backend service exposes an API that listens on port 5001 of your localhost. This service provides a single endpoint for generating text completions using the trained GPT-2 model.

API Endpoint

To interact with the service, you can use the following curl command to send a POST request:

curl -X POST "http://127.0.0.1:5001/complete-sentence" \
     -H "Content-Type: application/json" \
     -d '{"text": "The quick brown fox"}'

In this request, "text" is the prefix, and the service will generate a complete sentence based on the prefix using the trained GPT-2 model. An example response might look like this:

{
  "completed_sentence": "The quick brown foxil!\nIn that hath this one little thousand time,\nI call'd new rest of much need a word,\nTo take that you love it made great two world will be love for a enemy.\nAnd to the"
} 

Model Generates Samples

The following are sample outputs generated by the model given the input "The course of true love never did run smooth." with a maximum of 50 tokens:

The course of true love never did run smooth.

So love I should come to speak, your lord!

KING RICHARD II: Nay, my name and love me with me crown, I come, or heart, and

The course of true love never did run smooth.

KING RICHARD II: What that my sovereign-desets, The time shall fly on him that be your love the state upon yourlook'er tears I love thou

The course of true love never did run smooth.

BENVOLIO: My Lord of England's heart and thine eyes: So far I Lord'd good hands so graces onbrokely so love'er, I

The course of true love never did run smooth.

RICHARD: He would, he's a king, which did come; The king was a men I say that heart will be with youeech' woe, I

The course of true love never did run smooth.

KING RICHARD II: What say I see the king?

RATCLIFF: Your news, my lord: HowEN: go should lie: My

Reference

This project is inspired by Andrej Karpathy's tutorial. You can watch the detailed explanation in his YouTube video.