max-ng/recurser

Reduce VRAM usage on transformer models

Python

Recurser

A new model and implementation to reduce VRAM usage on transformer models.

Online demos

Reduce the VRAM usage of GPT2-XL by 25%. We can run GPT2-XL(float32) with Pytorch on the colab or with our gpu.

Installation

Always install the library from PyPI:

  pip install recursers

Todos

Re-implement recurser for other models
Enable MPS acceleration on Mac
Retraining: The model training of the recurser is a little different from the usual.

Reference

Karpathy's elegant GPT implementation
https://github.com/karpathy/nanoGPT

Hugging Face's library
https://github.com/huggingface/transformers