This reference solution is intended to give customers and partners an example of how one can deploy and manage an Edge AI workload by leveraging certified AzSHCI hardware and using AKS and ARC.
- Prerequisites
- Preparing AzSHCI - 2 node cluster
- Configuring ARC and AKS on AzSHCI
- Creating AI Workload AKS Cluster
- Integrating with GitHub
- Deploy AI Workload
- Validate E2E Solution Working
- Cleanup Resources
For this E2E reference solution you would need the following prerequisites:
- 2 - node cluster Deploy a 2-node cluster on AzSHCI
- Azure subscription
- Windows Admin Center
Follow the Microsoft Learn documentation to set up Windows Admin Center (WAC) QuickStart setup AzSHCI with WAC
Follow the Microsoft Learn documents to configure your two-node cluster: Deploy a 2-node cluster on AzSHCI
When setting up AKS you will perform the steps to initially set up the AKS Management cluster and reserve IPs for all the Worker Clusters, then you will proceed to step below Creating AI Workload AKS Cluster. Work with your networking engineers to reserve a block of IP addresses and ensure you have vSwitch created. Gateway and DNS Servers can be found by looking at setting of the vSwitch in WAC.
Subnet prefix: 172.23.30.0/24
Gateway: 172.23.30.1
DNS Servers:
172.22.1.9
172.22.3.9
Cloud agent IP – 172.23.30.151
Virtual IP address pool start – 172.23.30.152
Virtual IP address pool end – 172.23.30.172
Kubernetes node IP pool start – 172.23.30.173
Kubernetes node IP pool end – 172.23.30.193
- Prepare the 2-node cluster by installing AKS, follow this PowerShell QuickStart Guide
- Alternatively, you could setup with WAC. The demo was created with Static-IPs from the above Engineering plan. AKS using WAC
Now you have AKS and ARC installed in your management cluster. You need to create a AI Workload cluster and prime the nodes to leverage the AI Accelerator hardware.
Follow instructions to create a cluster named: AI Workload We stood up a 3 node AKS cluster.
Once your AI Workload cluster is created, go to WAC Cluster Manager, and look at VM list. Take note of VM names for the AI Workload.
Follow these steps to create a GPU Pool in WAC and assign the VMs from the AI Workload Cluster.
Now that we have the GPUs assigned, we need to install Docker and the Nvidia plug-in.
1.Go to Docker page and find your respective binary. For this example, we use x86_64 docker-20.10.9.tgz. Docker binaries
- Get the Workload AI node IP address and connect using your rsa. When using WAC, these will be placed in your Cluster storage under volumes then AksHCI. You can run this from your dev machine command prompt, but ensure you are in the same folder as the rsa file. For command below we copied out the rsa file to dev machine and renamed to akshci_rsa.xml. Learn more at Connect with SSH to Linux or Windows worker nodes
ssh -i akshci_rsa.xml clouduser@172.23.30.157
- Once on the Workload AI node, download the docker binary.
sudo curl https://download.docker.com/linux/static/stable/x86_64/docker-20.10.9.tgz -o docker-20.10.9.tgz
- Inflate docker binaries.
sudo tar xzvf docker-20.10.9.tgz
- Remove any running files.
sudo rm -rf '/usr/bin/containerd'
sudo rm -rf '/usr/bin/containerd-shim-runc-v2'
- Copy the binaries to your clouduser location.
sudo cp docker/* /usr/bin/
- Run docker in background.
sudo dockerd &
-
Installing the Nvidia GPU plugin. Go to Nvidia page for full set of instructions. GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes
-
Update docker as default runtime by createing daemon.json
sudo vim /etc/docker/daemon.json
- Paste into newly created daemon.json file
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
- Ceck to ensure changes took.
sudo cat /etc/docker/daemon.json
- Remove running files and restart docker
sudo rm /var/run/docker.pid
sudo rm -rf /var/lib/docker/volumes/*
sudo dockerd &
- Configure containerd. Open the config.toml file and paste in modification from step 13.
sudo vim /etc/containerd/config.toml
- Paste into file
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
- Check to ensure changes took.
sudo cat /etc/containerd/config.toml
- Restart containerd
sudo systemctl restart containerd
- Optional troubleshooting:
sudo systemctl stop containerd
sudo systemctl start containerd
sudo containerd
- From powershell in the kubectl command line. Enabeling GPU supporting in k8.
Run deployment
kubectl apply -f edge-ai1.yaml
Run nvidia plugin
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.3/nvidia-device-plugin.yml
- Follow the QuickStart to configure your ARC enabled AKS cluster with GitHub using Flux.
Remember to have the Kubernetes default namespace identified in your deployment yaml.
rtsp://172.23.30.162:30007/ds-test
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.