Reference Solution - EdgeAI running on AzS HCI using AKS and Arc

This reference solution is intended to give customers and partners an example of how one can deploy and manage an Edge AI workload by leveraging certified AzSHCI hardware and using AKS and ARC.

Major sections of this E2E tutorial:

Prerequisites
Preparing AzSHCI - 2 node cluster
Configuring ARC and AKS on AzSHCI
Creating AI Workload AKS Cluster
Integrating with GitHub
Deploy AI Workload
Validate E2E Solution Working
Cleanup Resources

Prerequisites

For this E2E reference solution you would need the following prerequisites:

2 - node cluster Deploy a 2-node cluster on AzSHCI
Azure subscription
Windows Admin Center

Preparing AzSHCI - 2 node cluster

Follow the Microsoft Learn documentation to set up Windows Admin Center (WAC) QuickStart setup AzSHCI with WAC

Follow the Microsoft Learn documents to configure your two-node cluster: Deploy a 2-node cluster on AzSHCI

Configuring ARC and AKS on AzSHCI

When setting up AKS you will perform the steps to initially set up the AKS Management cluster and reserve IPs for all the Worker Clusters, then you will proceed to step below Creating AI Workload AKS Cluster. Work with your networking engineers to reserve a block of IP addresses and ensure you have vSwitch created. Gateway and DNS Servers can be found by looking at setting of the vSwitch in WAC.

Here is the Engineering Plan used for our E2E Demo:

Subnet prefix: 172.23.30.0/24

Gateway: 172.23.30.1

DNS Servers:

172.22.1.9

172.22.3.9

Cloud agent IP – 172.23.30.151

Virtual IP address pool start – 172.23.30.152

Virtual IP address pool end – 172.23.30.172

Kubernetes node IP pool start – 172.23.30.173

Kubernetes node IP pool end – 172.23.30.193

Prepare the 2-node cluster by installing AKS, follow this PowerShell QuickStart Guide
Alternatively, you could setup with WAC. The demo was created with Static-IPs from the above Engineering plan. AKS using WAC

Creating AI Workload AKS Cluster

Now you have AKS and ARC installed in your management cluster. You need to create a AI Workload cluster and prime the nodes to leverage the AI Accelerator hardware.

Create AI Workload Cluster

Follow instructions to create a cluster named: AI Workload We stood up a 3 node AKS cluster.

Create a GPU Pool and attach GPUs to AI Workload Nodes

Once your AI Workload cluster is created, go to WAC Cluster Manager, and look at VM list. Take note of VM names for the AI Workload.

Follow these steps to create a GPU Pool in WAC and assign the VMs from the AI Workload Cluster.

Preparing node for AI workload

Now that we have the GPUs assigned, we need to install Docker and the Nvidia plug-in.

1.Go to Docker page and find your respective binary. For this example, we use x86_64 docker-20.10.9.tgz. Docker binaries

Get the Workload AI node IP address and connect using your rsa. When using WAC, these will be placed in your Cluster storage under volumes then AksHCI. You can run this from your dev machine command prompt, but ensure you are in the same folder as the rsa file. For command below we copied out the rsa file to dev machine and renamed to akshci_rsa.xml. Learn more at Connect with SSH to Linux or Windows worker nodes

ssh -i akshci_rsa.xml clouduser@172.23.30.157

Once on the Workload AI node, download the docker binary.

sudo curl https://download.docker.com/linux/static/stable/x86_64/docker-20.10.9.tgz -o docker-20.10.9.tgz

Inflate docker binaries.

sudo tar xzvf docker-20.10.9.tgz

Remove any running files.

sudo rm -rf '/usr/bin/containerd' 
sudo rm -rf '/usr/bin/containerd-shim-runc-v2'

Copy the binaries to your clouduser location.

sudo cp docker/* /usr/bin/

Run docker in background.

sudo dockerd &

Installing the Nvidia GPU plugin. Go to Nvidia page for full set of instructions. GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes
Update docker as default runtime by createing daemon.json

sudo vim /etc/docker/daemon.json

Paste into newly created daemon.json file

{
    "default-runtime": "nvidia",
     "runtimes": {
         "nvidia": {
             "path": "/usr/bin/nvidia-container-runtime",
             "runtimeArgs": []
         }
     }
}

Ceck to ensure changes took.

sudo cat /etc/docker/daemon.json

Remove running files and restart docker

sudo rm /var/run/docker.pid 
sudo rm -rf /var/lib/docker/volumes/*
sudo dockerd &

Configure containerd. Open the config.toml file and paste in modification from step 13.

sudo vim /etc/containerd/config.toml

Paste into file

version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

Check to ensure changes took.

sudo cat /etc/containerd/config.toml

Restart containerd

sudo systemctl restart containerd

Optional troubleshooting:

sudo systemctl stop containerd
sudo systemctl start containerd
sudo containerd

From powershell in the kubectl command line. Enabeling GPU supporting in k8.

Run deployment

kubectl apply -f edge-ai1.yaml

Run nvidia plugin

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.3/nvidia-device-plugin.yml

Integrating with GitHub

Follow the QuickStart to configure your ARC enabled AKS cluster with GitHub using Flux.

Remember to have the Kubernetes default namespace identified in your deployment yaml.

Deploy AI Workload

Validate E2E Solution Working

go to VCL and see inferencing results

rtsp://172.23.30.162:30007/ds-test

Cleanup Resources

Configuring ARC and integration with GitHub

Deploying Edge AI workload

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

falloutxAY/edge-ai-w-aks-arc-azhci