/ollama-kubernetes

Kubernetes Helm chart to deploy Large Language Models with Ollama

Primary LanguageSmartyMIT LicenseMIT

Ollama

Kubernetes Helm chart to deploy Large Language Models with Ollama.

How to use this chart

Setup helm chart repo:

helm repo add ollama https://feisky.xyz/ollama-kubernetes
helm repo update

Deploy Ollama with default Lobe Chat UI:

helm upgrade --install ollama ollama/ollama \
    --namespace=ollama \
    --create-namespace

Deploy Ollma with Open WebUI:

helm upgrade --install ollama ollama/ollama \
    --namespace=ollama \
    --create-namespace \
    --set ui.type=open-webui \
    --set ui.image.repository=ghcr.io/open-webui/open-webui

After the deployment, you can access the Ollama UI by port-forwarding the service:

kubectl -n ollama port-forward service/ollama-webui 8080:80

Then open your browser and go to http://localhost:8080.

Configuration

The following table lists the configurable parameters of the Ollama chart and their default values.

Parameter Description Default
image.repository Image repository of Ollama "ollama/ollama"
image.tag Image tag of Ollama 0.2.3
replicaCount Number of replicas, need storge class support of multiple read when pvc enabled and replica > 1 1
llm.models List of models to be loaded ["phi3", "llama3"]
persistentVolume.enabled Whether to enable persistent volume for Ollama true
persistentVolume.storageClass Storage class for Ollama persistent volume "default"
persistentVolume.accessModes Access mode for Ollama persistent volume ["ReadWriteOnce"]
persistentVolume.size Storage size for Ollama persistent volume "30Gi"
persistentVolume.claimName Set to non-empty value to use an existing PVC for Ollama persistent volume ""
resources.limits.cpu CPU limits for Ollama container 4
resources.limits.memory Memory limits for Ollama container "4Gi"
resources.limits.nvidia.com/gpu GPU limits for Ollama container "1"
resources.requests.cpu CPU requests for Ollama container "100m"
resources.requests.memory Memory requests for Ollama container "128Mi"
resources.requests.nvidia.com/gpu GPU requests for Ollama container "1"
nodeSelector Node selector for Ollama Pod {}
tolerations Tolerations for Ollama Pod [{"key": "kubernetes.azure.com/scalesetpriority", "operator": "Exists"}]
affinity Affinity for Ollama Pod {}
ui.enabled Whether to enable WebUI true
ui.type Supported UI types are "open-webui" and "lobe-chat" lobe-chat
ui.replicaCount Replica count for WebUI Pod 1
ui.image.repository Image repository of WebUI Pod "ghcr.io/open-webui/open-webui"
ui.image.tag Image tag of WebUI Pod "latest"
ui.service.type Service type of WebUI "ClusterIP"
ui.service.port Service port of WebUI 80
ui.nodeSelector Node selector for WebUI {}
ui.tolerations Tolerations for WebUI {}
ui.affinity Affinity for WebUI {}
ui.ingress.enabled Whether to enable Ingress for WebUI false
ui.ingress.className Ingress class name for WebUI ""
ui.ingress.hosts Ingress hosts for WebUI [{"host": "chart-example.local", "paths": [{"path": "/", "pathType": "ImplementationSpecific"}]}]
ui.ingress.tls Ingress TLS for WebUI []
ui.persistentVolume.enabled Whether to enable persistent volume for WebUI true
ui.persistentVolume.storageClass Storage class for WebUI persistent volume "default"
ui.persistentVolume.accessModes Access mode for WebUI persistent volume ["ReadWriteOnce"]
ui.persistentVolume.size Storage size for WebUI persistent volume "10Gi"
ui.persistentVolume.claimName Set to non-empty value to use an existing PVC for WebUI persistent volume ""