/LitGPT

Minimal GPT Implementation in PyTorch Lightning

Primary LanguagePython

Pre-commit  Tests

⚡️ Lightning Minimal GPT

This repo trains a PyTorch implementation of minGPT using PyTorch Lightning. MinGPT is a minimal version of a GPT language model as taught in Kaparthy's zero-to-hero course. This codebase is a 'playground' repository where I can practice writing (hopefully!) better deep learning code.

🔧 Installation

To install dependencies and activate the conda environment:

conda env create -f env.yml
conda activate litgpt

If developing, install pre-commit checks:

pre-commit install

📈 Training

To train the model (whilst in the conda environment):

litgpt fit --config configs/default.yaml

You can override and extend the config file using the CLI. Arguments like --optimizer and --lr_scheduler accept Torch classes. Run litgpt fit --help or read the LightningCLI docs for all options.

👀 Logging

We provide config files for Tensorboard and Weights & Biases monitoring. Training with the default config (as above) uses Tensorboard. You can monitor training by running:

tensorboard --log-dir=checkpoints/

To log with Weights & Biases use the default_wandb.yaml or ddp.yaml config files. You will need to authenticate for the first time using wandb login.

🚀 HPC

A script for DDP training on Slurm-managed HPC is provided. Update the shell script where required, make it executable (with chmod +x scripts/slurm.sh), and run it:

scripts/slurm.sh

This script will generate and submit a slurm job using sbatch. Generating the script dynamically allows resource requests to be set once at the top of the file, then passed to both slurm (to allocate resources) and Lightning (to utilise them).