small-hack/smol-k8s-lab

Feature Request: Update GPU operator invocation

cloudymax opened this issue · 5 comments

Looks like Rancher is doing it like this: https://gist.github.com/bgulla/5ea0e7fd310b5db4f9b66036d1cdb3d3

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
helm install --wait nvidiagpu \
     -n gpu-operator --create-namespace \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia \
    --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
    --set-string toolkit.env[3].value=true \
     nvidia/gpu-operator

delete:
helm uninstall -n gpu-operator nvidiagpu

cluster-info:
kubectl get nodes -o wide

I will take this as a feature if you can do the PR

@cloudymax this should first be an app in https://github.com/small-hack/argocd-apps/tree/main and then we can add it to the default config for smol k8s :3

Now that v1.0.0 is officially out, it's much easier to add this to the default applications. Some notes for that:

@cloudymax I'm marking this as blocked based on your work on this helm chart, but feel free to unblock it when you're ready

Closing based on #58 (comment)