/gptj-model-server

Production http server for our finetuned gpt-j model gptj-title-teaser-10k.

Primary LanguagePythonMIT LicenseMIT

GPTJ Model Server

This repo provides the code for serving our finetuned gptj-title-teaser-10k model in production using a simple HTTP server.

NOTE: There is an even faster server for this model, built with the Potassium framework. Please use the Potassium Server for optimal performance.

GitHub

Quickstart:

Curious to get your hand on our finetuned gpt-j model for title and teaser generation?

You can check it out with docker:

  1. Run docker build -t gptj-title-teaser-10k . && docker run -it gptj-title-teaser-10k to build and run the docker container.

Or you can check it out manually:

  1. Run pip3 install -r requirements.txt to download dependencies.
  2. Run python3 server.py to start the server.
  3. Run python3 test.py in a different terminal session to test against it.

Note: Model requires a GPU with ~ 12GB memory for generation!

Overview:

  1. app.py contains the code to load and run the model for inference.
  2. You can run a simple test with test.py!

if deploying using Docker:

  1. download.py is a script to download our finetuned model weights at build time.

Production:

This repo provides you with a functioning http server for our finetuned gptj-title-teaser-10k model. You can use it as is, or package it up with our provided Dockerfile and deploy it to your favorite container hosting provider!

We are currently running this code on Banana, where you can get 1 hour of model hosting for free. Feel free to choose a different hosting provider. In the following section we provide instructions for deployment with Banana.

🍌

To deploy to Banana Serverless:

  • Fork this repo
  • Log in to the Banana App
  • Select your forked repo for deploy

It'll then be built from the dockerfile, optimized, then deployed on Banana Serverless GPU cluster.
You can monitor buildtime and runtime logs by clicking the logs button in the model view on the Banana Dashboard.

Demo Integration:

When build and optimization finished successfully you will find your credentials printed in the build logs.

Your model was updated and is now deployed!
It is runnable with the same credentials:

API_KEY=Your-Personal-Api-Key
MODEL_KEY=Your-Personal-Model-Key

You need these keys to hook up the web app with the model.
To setup the frontend follow the instructions in the web-app repository.