This repo contains setup scripts and configs needed for getting ready minimum requirements of a Kubernetes cluster network, with optinal GPU support.
- prepare a kubeadm init config file (we have a template
k8s_configs/kubeadm_init_config.yaml
, make sure to change local_ip on advertiseAddress and certSANs, podSubnet can be left as it is) - change IP_AUTODETECTION_METHOD field in
k8s_configs/calico.yaml
to use the correct NIC. You can detect the NIC used by your system by runningifconfig
. - run
k8s_configs/master_setup.sh
- save the
kubeadm join
command after successful setup
To setup metrics server for Kubernetes cluster, run the following commands:
kubectl apply -f metrics_server.yaml kubectl edit deployments.apps -n kube-system metrics-server add hostNetwork:true after dnsPolicy:ClusterFirst kubectl rollout restart deployment metrics-server -n kube-system # verify the metrics server is setup correctly kubectl top nodes
To setup worker node for Kubernetes cluster, run the following commands on each worker node
./k8s_configs/worker_setup.sh sudo su - # use join command outputted from master init kubeadm join 10.0.0.1:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:79b0ee3d035eb825274aa716a1e15cbfe486dab87da431b1781a7e1677213308
If you have limited space in default root directory of containerd (i.e. /var/lib/containerd
), consider changing the root directory by running the following commands
# IMPORTANT: stop kubelet first sudo systemctl stop kubelet sudo vim /etc/containerd/config.toml
Add the following into the config
version = 2 # persistent data location root = "NEW_CONTAINERD_ROOT_DIR" #your_free_path_here
Reload the services
sudo systemctl daemon-reload sudo systemctl restart containerd sudo systemctl restart kubelet
If you ever find out kubelet is not available during rebooting, etc, remember to turn-off swap by running the following commands
sudo -i swapoff -a exit
To use GPUs for your Kubernetes cluster, follow these steps to setup CUDA plugin.
On master node, run
./cuda_configs/gpu_master_setup.sh
On worker node, run
./cuda_configs/gpu_worker_setup.sh
Time-slicing is a powerful feature provided by recent Nvidia GPUs that enables time-sharing on GPUs. If enabled, multiple GPU workloads could be running simultaneously on a single GPU. (reference: Time-Slicing GPUs in Kubernetes and Install NVIDIA GPU Operator)
To enable time-sharing, follow these steps:
Add the Nvidia Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update
Update containerd config /etc/containerd/config.toml
with the following (an example is provided in our repo: ./cuda_configs/gpu_operator_containerd_config.toml
)
version = 2 [plugins] [plugins."io.containerd.grpc.v1.cri"] [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime"
Restart services
sudo systemctl stop kubelet sudo systemctl daemon-reload sudo systemctl restart containerd sudo systemctl restart kubelet
Install the GPU operator with the following options:
helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator \ --set driver.enabled=false \ --set toolkit.enabled=false
Prepare a time-slicing configuration (an example is provided in our repo: ./cuda_configs/time_slicing.yaml
)
version: v1 sharing: timeSlicing: renameByDefault: failRequestsGreaterThanOne: resources: - name: replicas: ...
Create this configuration in GPU operator namespace
kubectl create -f cuda_configs/time_slicing.yaml
Apply the default configuration across the cluster
kubectl patch clusterpolicy/cluster-policy \ -n gpu-operator --type merge \ -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "your_default_gpu"}}}}'
Verify that the time-slicing configuration is applied successfully to all GPU nodes in the cluster:
kubectl describe node ... Capacity: nvidia.com/gpu: 8 ... Allocatable: nvidia.com/gpu: 8 ...
If due to any reason you find out Kubernetes is in an inconsistent state after setting up GPU time-slicing feature, follow these steps to rollback the changes
Remove GPU operator
helm uninstall gpu-operator -n gpu-operator
Restore the original containerd config to the point before time-slicing feature is installed
version = 2 # persistent data location root = "NEW_CONTAINERD_ROOT_DIR" ignore_image_defined_volumes = false [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "overlayfs" default_runtime_name = "nvidia" no_pivot = false disable_snapshot_annotations = true discard_unpacked_layers = false privileged_without_host_devices = false base_runtime_spec = "" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v1" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d"
Restart services
sudo systemctl stop kubelet sudo systemctl daemon-reload sudo systemctl restart containerd sudo systemctl restart kubelet