/k8s_setup

Primary LanguageShellApache License 2.0Apache-2.0

Kubernetes Minimum Requirements Setup

This repo contains setup scripts and configs needed for getting ready minimum requirements of a Kubernetes cluster network, with optinal GPU support.

Kubernetes Infra Setup

Master Node Setup

  1. prepare a kubeadm init config file (we have a template k8s_configs/kubeadm_init_config.yaml, make sure to change local_ip on advertiseAddress and certSANs, podSubnet can be left as it is)
  2. change IP_AUTODETECTION_METHOD field in k8s_configs/calico.yaml to use the correct NIC. You can detect the NIC used by your system by running ifconfig.
  3. run k8s_configs/master_setup.sh
  4. save the kubeadm join command after successful setup

(Optional) Setup Metrics Server

To setup metrics server for Kubernetes cluster, run the following commands:

kubectl apply -f metrics_server.yaml
kubectl edit deployments.apps -n kube-system metrics-server
add hostNetwork:true after dnsPolicy:ClusterFirst
kubectl rollout restart deployment metrics-server -n kube-system
# verify the metrics server is setup correctly
kubectl top nodes

Worker Node Setup

To setup worker node for Kubernetes cluster, run the following commands on each worker node

./k8s_configs/worker_setup.sh
sudo su -
# use join command outputted from master init
kubeadm join 10.0.0.1:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:79b0ee3d035eb825274aa716a1e15cbfe486dab87da431b1781a7e1677213308 

(Optional) Change Root Directory of containerd

If you have limited space in default root directory of containerd (i.e. /var/lib/containerd), consider changing the root directory by running the following commands

# IMPORTANT: stop kubelet first
sudo systemctl stop kubelet
sudo vim /etc/containerd/config.toml

Add the following into the config

version = 2

# persistent data location
root = "NEW_CONTAINERD_ROOT_DIR" #your_free_path_here

Reload the services

sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl restart kubelet

Tips

Make sure swap is always turned off

If you ever find out kubelet is not available during rebooting, etc, remember to turn-off swap by running the following commands

sudo -i
swapoff -a
exit

Kubernetes CUDA Plugin Setup

To use GPUs for your Kubernetes cluster, follow these steps to setup CUDA plugin.

On master node, run

./cuda_configs/gpu_master_setup.sh

On worker node, run

./cuda_configs/gpu_worker_setup.sh

(Optional) Enable time-slicing feature

Time-slicing is a powerful feature provided by recent Nvidia GPUs that enables time-sharing on GPUs. If enabled, multiple GPU workloads could be running simultaneously on a single GPU. (reference: Time-Slicing GPUs in Kubernetes and Install NVIDIA GPU Operator)

To enable time-sharing, follow these steps:

Add the Nvidia Helm repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

Update containerd config /etc/containerd/config.toml with the following (an example is provided in our repo: ./cuda_configs/gpu_operator_containerd_config.toml)

version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

Restart services

sudo systemctl stop kubelet
sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl restart kubelet

Install the GPU operator with the following options:

helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
      nvidia/gpu-operator \
      --set driver.enabled=false \
      --set toolkit.enabled=false

Prepare a time-slicing configuration (an example is provided in our repo: ./cuda_configs/time_slicing.yaml)

version: v1
sharing:
  timeSlicing:
    renameByDefault: 
    failRequestsGreaterThanOne: 
    resources:
    - name: 
      replicas: 
    ...

Create this configuration in GPU operator namespace

kubectl create -f cuda_configs/time_slicing.yaml

Apply the default configuration across the cluster

kubectl patch clusterpolicy/cluster-policy \
   -n gpu-operator --type merge \
   -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "your_default_gpu"}}}}'

Verify that the time-slicing configuration is applied successfully to all GPU nodes in the cluster:

kubectl describe node 
...
Capacity:
nvidia.com/gpu: 8
...
Allocatable:
nvidia.com/gpu: 8
...

Tips

Uninstall time-slicing feature

If due to any reason you find out Kubernetes is in an inconsistent state after setting up GPU time-slicing feature, follow these steps to rollback the changes

Remove GPU operator

helm uninstall gpu-operator -n gpu-operator

Restore the original containerd config to the point before time-slicing feature is installed

version = 2

# persistent data location
root = "NEW_CONTAINERD_ROOT_DIR"
ignore_image_defined_volumes = false
[plugins."io.containerd.grpc.v1.cri".containerd]
        snapshotter = "overlayfs"
        default_runtime_name = "nvidia"
        no_pivot = false
        disable_snapshot_annotations = true
        discard_unpacked_layers = false
        privileged_without_host_devices = false
        base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_root = ""
        runtime_type = "io.containerd.runc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
        BinaryName = "/usr/bin/nvidia-container-runtime"
        SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
        bin_dir = "/opt/cni/bin"
        conf_dir = "/etc/cni/net.d"

Restart services

sudo systemctl stop kubelet
sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl restart kubelet