Autoscaling Azure DevOps Agents running on AKS
Repo structure
docker-agent
: a Dockerfile to build a DevOps agent image, taken from Azure docs fordocker-scaler
: the sourcecode for the container that will do the scaling i.e. check if there is need for a new agent and trigger a pod deployment within AKShelm-batchjob-agent
: helm chart that is used to deploy a new agent, configure your agents (memory, CPU, disks) in this charthelm-cronjob-scaler
: the master chart that needs to be deployed once to start the autoscaler within your cluster, here you configure your Azure DevOps organization + secrets
How to setup autoscaling DevOps agents in AKS
- Create Kubernetes environment for build agents
- Prepare DevOps project for this agent pool
- Deploy autoscaler script as CRON job to cluster
step 1: create Azure Kubernetes Service (AKS) cluster for the build agents
e.g. following the official Azure docs to Deploy an Azure Kubernetes Service cluster using the Azure CLI
After creating the cluster it needs to get access permissions to the docker registry (ACR) so the agent images can be downloaded
#!/bin/bash
# grant AKS service principal access to docker registry
# details see https://docs.microsoft.com/en-us/azure/container-registry/container-registry-auth-aks
AKS_RESOURCE_GROUP=myClusterResourceGroup
AKS_CLUSTER_NAME=myClusterName
ACR_RESOURCE_GROUP=myContainerResourceGroup
ACR_NAME=myACRname
# Get the id of the service principal configured for AKS
CLIENT_ID=$(az aks show --resource-group $AKS_RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query "servicePrincipalProfile.clientId" --output tsv)
# Get the ACR registry resource id
ACR_ID=$(az acr show --name $ACR_NAME --resource-group $ACR_RESOURCE_GROUP --query "id" --output tsv)
# Create role assignment
az role assignment create --assignee $CLIENT_ID --role acrpull --scope $ACR_ID
step 2: prepare DevOps project for agent pool
- Prerequisite: Install the Azure CLI
- get the base
az
CLI:curl -sL https://aka.ms/InstallAzureCLIDeb | sudo -E bash
(WARNING: this is the yolo-version where you trust that the shell script is sane, do not run this in production instead follow the manual setup instructions) - install the devops extension
az extension add --name azure-devops
- optional: configure the defaults
az devops configure --defaults organization=https://dev.azure.com/<my devops org>/ project="<my devops project>"
- get the base
- Create an Agent Pool
- Open Project -> Settings -> Agent Pools
- add a new pool, give it a meaningful name as it will be used within the pipelines e.g.
ubuntu18-aks
🚨 pools without agents can not be assigned any jobs (will fail immediately), therefore we need at least one dummy agent available in the agent pool so we can queue jobs - Create a dummy agent (named
dummy
) by following the instructions to create a dedicated agent - Stop the agent and feel free to remove any local files, after it has been registered once it is no longer needed - DO NOT REMOVE OR DISABLE THE DUMMY AGENT
step 3: Deploy autoscaler script
Required tools:
- Helm
- kubectl
- az (Azure CLI)
- Make sure the
pipelines-scaler
andpipelines-agent
docker images are available in the container registry- for Dockerfiles see
docker-scaler
anddocker-agent
- for Dockerfiles see
- Activate kubectl for the correct AKS cluster e.g.
az aks get-credentials -g <resource group> -n <aks cluster name>
- Create the namespace for agent pods to run in
kubectl create namespace agents
, this needs to match the definition indocker-scaler/scaler.py
- Put a valid Personal Access Token in your environment variables
AZP_TOKEN
with the Agent Pools: Read & Manage permission - Within this repository execute the following commands to deploy the Helm chart
cd helm-cronjob-scaler helm upgrade --install \ --set devops.token=`echo ${AZP_TOKEN} | base64` \ scaler .
- Done! Agents will now automatically be created if a job is pending
Local development / Debugging
Running the scaler manually
Preparations
- place a
.env
file withAZP_TOKEN=<PAT token>
into thedocker-scaler/
directory containing a PAT token that has Agent Pools: Read & Manage permission
Run the commands
cd docker-scaler/
source .env # to load the AZP_TOKEN variable
./scaler.py -o <devops org> -p "<devops project>" -t $AZP_TOKEN --helm-chart helm-batchjob-agent --pool-name <pipelines pool name> info
### Azure Kubernetes
Debugging Kubernetes
```sh
# deploy the dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml
# HINT FOR WSL
# the azure CLI will automatically open the (very long) URL automatically, for this to work WSL needs to know which browser to open
export BROWSER='/c/Program Files (x86)/Mozilla Firefox/firefox.exe'
# browse the dashboard
az aks browse -g myClusterResourceGroup -n myClusterName
Deploying AKS based agents manually
helm upgrade --install --namespace agents --set devops.token=`echo ${AZP_TOKEN} | base64` rick ./helm-batchjob-agent