This repo gives a framework to serve ML models in production using simple HTTP servers.
To generalize this to deploy anything on Banana, see the guide here.
Choose 1 of 3 options
- Fork this repo to create a public repo
- Click Use this Template, which creates a private or public repo
- Create your own repo and copy the template files from here this repo into yours
Then clone the repo to a machine with a GPU to install + test it locally
The repo default runs a HuggingFace BERT model.
- Run
pip3 install -r requirements.txt
to download dependencies. - Run
python3 server.py
to start the server. - Run
python3 test.py
in a different terminal session to test an inference against it.
- Edit
app.py
to load and run your model. - Make sure to test with
test.py
! - When ready to deploy:
- edit
download.py
(or theDockerfile
itself) with scripts download your custom model weights at build time - edit
requirements.txt
with your pip packages. Don't delete the "sanic" line, as it's a banana dependency.
You now have a functioning http server that should work using Docker and complete inferences on a GPU
To deploy on banana
- Log in to the Banana App
- Connect your github
- Select this repository
It'll then be built from the dockerfile, optimized, then deployed on our Serverless GPU cluster! You can then call it with any of our SDKs