ExllamaV2 Worker on Runpod Serverless

This is worker code which uses ExllamaV2 for inference on Runpod Serverless.

🌟 How to use

Clone this repository
build docker image
push docker image to your docker registry
deploy to Runpod Serverless

🏗️ build docker image

docker build -t <your docker registry>/<your docker image name>:<your docker image tag> .

These are the build arguments:

Note: The model gets downloaded on first run so these models get downloaded on runtime instead of on build. Make sure you are attaching network volume to serverless endpoint so download only happens once and the cache is used for subsequent runs

key	value	optional
HUGGING_FACE_HUB_TOKEN	your huggingface token	true
MODEL_NAME	your model name	false
MODEL_REVISION	your model revision	true
MODEL_BASE_PATH	your model base path	true
LORA_ADAPTER_NAME	your lora adapter name	true
LORA_ADAPTER_REVISION	your lora adapter revision	true

⏫ push docker image to your docker registry

docker push <your docker registry>/<your docker image name>:<your docker image tag>

🚀 deploy to Runpod Serverless

After having docker image on your docker registry, you can deploy to Runpod Serverless.

StableFluffy/runpod-worker-exllamav2

ExllamaV2 Worker on Runpod Serverless

🌟 How to use

🏗️ build docker image

⏫ push docker image to your docker registry

🚀 deploy to Runpod Serverless