This project allows easy deployment of any built-in LLM using xinference and docker.
- Install docker and docker-compose (now also 'docker compose') on your machine.
- Install nvidia-docker libraries (find details about the Nvidia-Container Toolkit here)
- Run
docker-compose pull
to use a pre-build image ordocker-compose build
to build it locally. - Run
docker-compose up -d
. This should start a container in the background that downloads and runs the vicuna-16k (13b) model. - Optional: There are two example environment file examples that can be commented and un-commented in the docker-compose.yml. The llama-2-chat file shows you how to use models that require a huggingface access token (if the token is placed in the .env file).
You can find a list of available LLM models in two ways:
- Set the environment variable
LIST=1
in the active .env file. Rundocker-compose up
, which will run the container attached until it prints a list of all available LLMs - Find a maybe not up-to-date list in the xinference documentation here