This guide looks into a few options to create an AKS cluster and explores networking and policy options offered in AKS. The examples include Azure CLI provisioned cluster and a cluster defined via ARM template.
For more details about AKS networking options refer to Everything you need to know about Kubernetes networking on Azure video recording.
Calico open source
and Calico Enterprise
versions are used in this guide. Refer to Calico documentation for more information on how one compares to the other.
For more information on how Azure policies compare to Calico policies refer to the official AKS documentation.
AKS offers several networking options with and without network policy engine.
-
Using kubenet networking plugin. It provides a very basic network configuration with Host-Local IPAM and
/24
routes in the VNET associated with the host. On its ownkubenet
doesn't configure Node-to-Node communications. However, if AKS cluster provisioned with this option has multiple nodes, the routes for Node-to-Node communications will be automatically configured by AKS provisioner.
Thekubenet
leverages Linux bridge calledcbr0
to facilitate POD-to-POD communications.This option has no
Calico
involved in cluster configuration.az aks create --network-plugin kubenet ...
-
Using
kubenet + Calico
networking plugin and network policy. This option is a bit misleading in its naming as it suggests thatkubenet
is used while in reality the cluster is configured to useCalico CNI
with Host-Local IPAM andCalico network policy
engine. Similar to purekubenet
option, you get/24
routes for PODs in the POD-network VNET.
For example in a cluster with node subnet10.240.0.0/16
each node get an IP from that subnet (e.g. node110.240.0.4
, node210.240.0.5
, etc.). Each POD gets an IP from a POD-network (e.g.10.244.0.0/16
) but from a CIDR that is associated with a node. Each node gets/24
route configured for its PODs in the route table. E.g. PODs on node1 get10.244.0.0/24
, PODs on node2 get10.244.1.0/24
, and so on.Note that
Calico
version in this configuration is managed by AKS controller and cannot be replaced. Any attempt to change the installedCalico
version will be rolled back by AKS controller.az aks create --network-plugin kubenet --network-policy calico ...
-
Using
azure-cni
withazure
network policy. This option configures the cluster withazure-cni
andazure
network policy engine. This policy engine implements basic Kubernetes network policies. The bridge is used on each host to facilitate POD-to-POD communications.az aks create --network-plugin azure --network-policy azure ...
-
Using
azure-cni
withCalico
network policy. In this configuration nodes and PODs use IPs from the same underlying VNET, and the POD IPs are configured as secondary IPs on the VM's Azure network interfaces. In this case the underlying VNET is fully aware of the pod IP addresses, and can route POD traffic without needing User Defined Routes. In other words, the POD-network is routable. You can tell whether the POD network is routable if you cannot see aroutetable
resource in your cluster's auxiliary resource group.It is important to size the VNET properly for AKS cluster as IP exhaustion is a common issue for AKS clusters with routable POD-network.
Such configuration allows for other Azure services to communicate with cluster PODs directly by using their IPs.
Calico
is used for network policy enforcement.The installed
Calico
version is maintained by AKS controller and cannot be changed.az aks create --network-plugin azure --network-policy calico ...
-
Using
azure-cni
withtransparent
network mode andCalico
network policy. This configuration usesazure-cni
as described in previous two options and configurestransparent
network mode that is compatible withCalico Enterprise
which is not managed by AKS controller.
To get AKS cluster configured withtransparent
network mode, you can either userARM template
or useaz aks create
command and ensure that Azure CLI is v2.17+ version which provisions AKS cluster with Azure CNI that hastransparent
mode as default network configuration.The
ARM template
deployment is discussed later in this guide.az aks create --network-plugin azure ...
-
Using
calico
networking with private IPAM withcalico
network policy. This configuration allows users to have private Calico network for pod network and user advanced Calico network policy features. This configuration takes advantage of bring your own CNI plugin with AKS features.The cluster is created without any CNI which allows the user to install Calico CNI for AKS pod networking.
az aks create --network-plugin none ...
AKS cluster requires a service principal to setup access to necessary cluster resources (e.g. vnet
, routetable
, etc.).
Configure a service principal to be used with the AKS cluster:
-
Check if service principal with a specific name already exists.
## Org or default Azure domain, e.g. contoso.com or calico.onmicrosoft.com #DOMAIN="ORG_DOMAIN" ## define var for service principal name #SP="calico-aks-sp.$DOMAIN" SP="calico-aks-sp" # list service principal az ad sp list --spn "http://$SP" --query "[].{id:appId,tenant:appOwnerTenantId,displayName:displayName,appDisplayName:appDisplayName,homepage:homepage,spNames:servicePrincipalNames}"
-
Create the service principal and capture its password.
Note, the password may contain special characters that may need to be escaped. Alternatively, you can recreate the service principal account in attempt to avoid character escaping.
SP_PASSWORD=$(az ad sp create-for-rbac --name "http://$SP" | jq '.password' | sed -e 's/^"//' -e 's/"$//')
This example uses az aks command of Azure CLI to create an AKS cluster using Calico CNI
with Host-Local IPAM and Calico network policy
engine managed by AKS controller.
If you want to use any AKS preview features configure aks-preview extension first.
This example refers to
SP
andSP_PASSWORD
variables defined in service principal section.
Create AKS cluster:
-
Login into
azure
and set helper variables.# login into az-cli az login ### set vars RG='calico-wbnr' LOCATION='centralus' CLUSTER_NAME='calico-cni' ## Org or default Azure domain, e.g. contoso.com or calico.onmicrosoft.com #DOMAIN="ORG_DOMAIN" #SP="calico-aks-sp.$DOMAIN" SP="calico-aks-sp" ROLE='Contributor' NET_ROLE='Network Contributor' K8S_VERSION=1.24.3
-
Check supported k8s versions for the region
# list supported k8s versions az aks get-versions --location $LOCATION --output table
-
Create the resource group and configure service principal roles on it
# create resoruce group az group create --name $RG --location $LOCATION # get resource group ID RG_ID=$(az group show -n $RG --query 'id' -o tsv) # get service principal client/app Id CLIENT_ID=$(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv) # set service principal Contributor role on resource group az role assignment create --role $ROLE --assignee $CLIENT_ID --scope $RG_ID # [optional] if Contributor role cannot be used, use 'Network Contributor' role which provides minimum required permissions for AKS resources az role assignment create --role $NET_ROLE --assignee $CLIENT_ID --scope $RG_ID
-
Deploy AKS cluster.
By default AKS cluster uses
VirtualMachineScaleSets
for its nodes. You can change it via--vm-set-type
parameter. Seeaz aks create --help
for details.# var to use existing SSH key SSH_KEY='/path/to/ssh_key.pub' # deploy AKS cluster using Calico CNI w/ Host-Local IPAM and Calico net policy az aks create \ --resource-group $RG \ --name $CLUSTER_NAME \ --kubernetes-version $K8S_VERSION \ --nodepool-name 'nix' \ --node-count 2 \ --network-plugin kubenet \ --network-policy calico \ --service-cidr 10.0.0.0/16 \ --dns-service-ip 10.0.0.10 \ --docker-bridge-address 172.17.0.1/16 \ --service-principal $CLIENT_ID \ --client-secret $SP_PASSWORD \ --node-osdisk-size 50 \ --node-vm-size Standard_D2s_v3 \ --max-pods 70 \ --output table \ --ssh-key-value $SSH_KEY
-
View cluster state.
# list aks clusters az aks list --resource-group $RG --output table
-
Once cluster is provisioned, retrieve
kubeconfig
info to communicate with the cluster and installkubectl
if needed.# if needed install kubectl az aks install-cli # retrieve kubeconfig az aks get-credentials --resource-group $RG --name $CLUSTER_NAME --file ./kubeconfig
At this point cluster should be ready for use. See demo section for example app and policies to deploy onto the cluster.
This example assumes that Azure CLI v2.17+ is used which provisions AKS cluster with Azure CNI v1.2+ where the transparent network mode was made to be the default network configuration option.
This example refers to
SP
andSP_PASSWORD
variables defined in service principal section.
Create AKS cluster:
-
Login into
azure
and set helper variables.# login into az-cli az login ### set vars RG='calico-wbnr' LOCATION='centralus' CLUSTER_NAME='calient-azcni' SP="calico-aks-sp" ROLE='Contributor' NET_ROLE='Network Contributor' K8S_VERSION=1.24.3
-
Check supported k8s versions for the region
# list supported k8s versions az aks get-versions --location $LOCATION --output table
-
Create the resource group and configure service principal roles on it
# create resoruce group az group create --name $RG --location $LOCATION # get resource group ID RG_ID=$(az group show -n $RG --query 'id' -o tsv) # get service principal client/app Id CLIENT_ID=$(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv) # set service principal Contributor role on resource group az role assignment create --role $ROLE --assignee $CLIENT_ID --scope $RG_ID # [optional] if Contributor role cannot be used, use 'Network Contributor' role which provides minimum required permissions for AKS resources az role assignment create --role $NET_ROLE --assignee $CLIENT_ID --scope $RG_ID
-
Deploy AKS cluster.
By default AKS cluster uses
VirtualMachineScaleSets
for its nodes. You can change it via--vm-set-type
parameter. Seeaz aks create --help
for details.# var to use existing SSH key SSH_KEY='/path/to/ssh_key.pub' # deploy AKS cluster using Calico CNI w/ Host-Local IPAM and Calico net policy az aks create \ --resource-group $RG \ --name $CLUSTER_NAME \ --kubernetes-version $K8S_VERSION \ --nodepool-name 'nix' \ --node-count 3 \ --network-plugin azure \ --service-cidr 10.0.0.0/16 \ --dns-service-ip 10.0.0.10 \ --docker-bridge-address 172.17.0.1/16 \ --service-principal $CLIENT_ID \ --client-secret $SP_PASSWORD \ --node-osdisk-size 50 \ --node-vm-size Standard_D2s_v3 \ --max-pods 70 \ --output table \ --ssh-key-value $SSH_KEY
-
View cluster state.
# list aks clusters az aks list --resource-group $RG --output table
-
Once cluster is provisioned, retrieve
kubeconfig
info to communicate with the cluster and installkubectl
if needed.# if needed install kubectl az aks install-cli # retrieve kubeconfig az aks get-credentials --resource-group $RG --name $CLUSTER_NAME --file ./kubeconfig
-
Refer to Tigera official documentation to install Calico Enterprise on AKS
At this point cluster should be ready for use. See demo section for example app and policies to deploy onto the cluster.
This example assumes that Azure CLI v2.39+ is used to provision AKS cluster with
none
for network plugin which allows to installcalico
CNI post cluster provisioning.
This example refers to
SP
andSP_PASSWORD
variables defined in service principal section and provisions AKS cluster to be ready for bring your own CNI configuration.
Create AKS cluster:
-
Login into
azure
and set helper variables.# login into az-cli az login ### set vars RG='calico-wbnr' LOCATION='westcentralus' CLUSTER_NAME='calient-byo-cni' SP="calico-aks-sp" ROLE='Contributor' NET_ROLE='Network Contributor' K8S_VERSION=1.24.3
-
Check supported k8s versions for the region
# list supported k8s versions az aks get-versions --location $LOCATION --output table
-
Create the resource group and configure service principal roles on it
# create resoruce group az group create --name $RG --location $LOCATION # get resource group ID RG_ID=$(az group show -n $RG --query 'id' -o tsv) # get service principal client/app Id CLIENT_ID=$(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv) # set service principal Contributor role on resource group az role assignment create --role $ROLE --assignee $CLIENT_ID --scope $RG_ID # [optional] if Contributor role cannot be used, use 'Network Contributor' role which provides minimum required permissions for AKS resources az role assignment create --role $NET_ROLE --assignee $CLIENT_ID --scope $RG_ID
-
Deploy AKS cluster.
By default AKS cluster uses
VirtualMachineScaleSets
for its nodes. You can change it via--vm-set-type
parameter. Seeaz aks create --help
for details.# var to use existing SSH key SSH_KEY='/path/to/ssh_key.pub' # deploy AKS cluster using Calico CNI w/ Host-Local IPAM and Calico net policy az aks create \ --resource-group $RG \ --name $CLUSTER_NAME \ --kubernetes-version $K8S_VERSION \ --nodepool-name 'nix' \ --node-count 3 \ --network-plugin none \ --pod-cidr 192.168.0.0/16 --service-principal $CLIENT_ID \ --client-secret $SP_PASSWORD \ --node-osdisk-size 50 \ --node-vm-size Standard_D2s_v3 \ --max-pods 70 \ --output table \ --ssh-key-value $SSH_KEY
-
View cluster state.
# list aks clusters az aks list --resource-group $RG --output table
-
Once cluster is provisioned, retrieve
kubeconfig
info to communicate with the cluster and installkubectl
if needed.# if needed install kubectl az aks install-cli # retrieve kubeconfig az aks get-credentials --resource-group $RG --name $CLUSTER_NAME --file ./kubeconfig
-
Refer to Tigera official documentation to install Calico Enterprise on AKS or ProjectCalico documentation to install open source Calico on AKS.
Note that default Calico CNI configuration assumes that AKS cluster is provisioned with pod CIDR
192.168.0.0/16
, i.e.--pod-cidr 192.168.0.0/16
. If you use a different range for AKS cluster, download and adjustcustom-resources-calico-cni.yaml
manifest before applying it.
At this point cluster should be ready for use. See demo section for example app and policies to deploy onto the cluster.
Using ARM template
to install az-cni
network plugin with Calico Enterprise
for network policy on AKS
This example refers to
SP
andSP_PASSWORD
variables defined in service principal section.
This example uses the ARM template
and its parameters
file located at arm folder in this repo. Before you can deploy the template, set required parameters servicePrincipalClientId
, servicePrincipalClientSecret
, and sshRSAPublicKey
in the aks.parameters.json file and adjust others if needed.
Make sure
AKSNetworkModePreview
feature is registered in yourAzure CLI
before deploying the cluster. Refer to register a feature section for more details.
-
Retrieve
servicePrincipalClientId
value.CLIENT_ID=$(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv)
-
Set
servicePrincipalClientSecret
using value fromSP_PASSWORD
variable defined in service principal section -
Set
sshRSAPublicKey
value that represents your SSH public key. It start withssh-rsa ...
-
Login into
azure
and set helper variables.# login into az-cli az login ### set vars RG='calient-wbnr' LOCATION='centralus' # Org or default Azure domain, e.g. contoso.com or calico.onmicrosoft.com DOMAIN="ORG_DOMAIN" SP="calico-aks-sp.$DOMAIN" ROLE='Contributor' NET_ROLE='Network Contributor' CLUSTER_NAME='calient-azcni' K8S_VERSION=1.24.3
-
Check supported k8s versions for the region
# list supported k8s versions az aks get-versions --location $LOCATION --output table
-
Create resource group and set service principal role on it.
# create resoruce group az group create --name $RG --location $LOCATION # get resource group ID RG_ID=$(az group show -n $RG --query 'id' -o tsv) # get service principal client/app Id CLIENT_ID=$(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv) # set service principal Contributor role on resource group az role assignment create --role $ROLE --assignee $CLIENT_ID --scope $RG_ID # [optional] if Contributor role cannot be used, use 'Network Contributor' role which provides minimum required permissions for AKS resources az role assignment create --role $NET_ROLE --assignee $CLIENT_ID --scope $RG_ID
-
Deploy the cluster.
# validate the template and parameters az deployment group validate --resource-group $RG --template-file arm/aks-vmss.json --parameters @arm/aks.parameters.json clusterName=$CLUSTER_NAME servicePrincipalClientId=$CLIENT_ID servicePrincipalClientSecret="$SP_PASSWORD" kubernetesVersion=$K8S_VERSION sshRSAPublicKey="$(cat $SSH_KEY)" # deploy the template with parameters az deployment group create --resource-group $RG --template-file arm/aks-vmss.json --parameters @arm/aks.parameters.json clusterName=$CLUSTER_NAME servicePrincipalClientId=$CLIENT_ID servicePrincipalClientSecret=$SP_PASSWORD kubernetesVersion=$K8S_VERSION sshRSAPublicKey="$(cat $SSH_KEY)"
-
View cluster state.
# list aks clusters az aks list --resource-group $RG --output table
-
View cluster
networkProfile
configuration and make sure"networkMode": "transparent"
option is setaz aks show -g $RG -n $CLUSTER_NAME --query 'networkProfile' --output json
-
Once cluster is provisioned, retrieve
kubeconfig
info to communicate with the cluster and installkubectl
if needed.# if needed install kubectl az aks install-cli # retrieve kubeconfig az aks get-credentials --resource-group $RG --name $CLUSTER_NAME --file ./kubeconfig
-
Refer to Tigera official documentation to install Calico Enterprise on AKS
At this point cluster should be ready for use. See demo section for example app and policies to deploy onto the cluster.
When done using the cluster, remove it and clean up related resources.
Remove the cluster using az aks
command.
# delete AKS cluster
az aks delete -n $CLUSTER_NAME -g $RG --yes #--no-wait
Remove the cluster using ARM template
.
# remove cluster using ARM template
az deployment group create --resource-group $RG --template-file arm/aks.cleanup.json --mode Complete
Remove resource group and service principal account.
# delete resource group
az group delete -n $RG --yes
# delete SP account
az ad sp delete --id $(az ad sp list --display-name "http://$SP" --query '[].appId' -o tsv)
The demo scenario uses several PODs and a few policy examples. The folders 30-*
and 35-*
are intended for Calico Enterprise as they use policy tiers.
Use calicoctl to work with policies in Calico OSS
cluster. You can use kubectl
for Calico Enterprise
cluster.
Calico OSS demo scenario can be used in both Calico OSS and Calico Enterprise clusters.
-
Deploy sample application.
# deploy app components kubectl apply -f demo/10-app/ # view deployed components kubectl -n demo get pod
The sample app consists of
nginx
deployment,centos
standalone POD, andnetshoot
standalone POD. Thenginx
instances serve static HTML page. Thecentos
instance queries an external resource,www.google.com
. Thenetshoot
instance queries thenginx
service with nameopen-nginx
. -
Attach to the log stream of
centos
andnetshoot
PODs to confirm that both processes can get HTTP 200 response. Then deploy Kubernetes default deny policy fordemo
namespace.# attach to PODs log streams kubectl -n demo logs -f centos --tail 1 kubectl -n demo logs -f netshoot --tail 1 # deploy k8s default deny policy kubectl apply -f demo/20-default-deny/policy-default-deny-k8s.yaml
After the policy is applied, both
centos
andnetshoot
processes should not be able to query the targeted resources. -
Allow
centos
to query external resource.# deploy centos allow policy DATASTORE_TYPE=kubernetes calicoctl apply -f demo/25-sample-policy/allow-centos-egress.yaml
Once policy is applied, the
centos
POD should be able to accesswww.google.com
resource. -
Allow DNS lookups for entire
demo
namespace andnginx
service access fornetshoot
component.# allow cluster DNS lookups DATASTORE_TYPE=kubernetes calicoctl apply -f demo/25-sample-policy/allow-cluster-dns.yaml # allow nginx ingress DATASTORE_TYPE=kubernetes calicoctl apply -f demo/25-sample-policy/allow-nginx-ingress.yaml # allow netshoot egress DATASTORE_TYPE=kubernetes calicoctl apply -f demo/25-sample-policy/allow-port80-egress.yaml
Once all three policies are applied, the
netshoot
POD should be able to get response from theopen-nginx
cluster service.
Calico Enterprise demo scenario can be used only in Calico Enterprise clusters as it uses enterprise features.
-
Deploy sample application.
# deploy app components kubectl apply -f demo/10-app/ # view deployed components kubectl -n demo get pod
The sample app consists of
nginx
deployment,centos
standalone POD, andnetshoot
standalone POD. Thenginx
instances serve static HTML page. Thecentos
instance queries an external resource,www.google.com
. Thenetshoot
instance queries thenginx
service with nameopen-nginx
. -
Deploy a new policy tier.
# deploy security tier kubectl apply -f demo/30-calient-tier/
-
Attach to the log stream of
centos
andnetshoot
PODs to confirm that both processes can get HTTP 200 response. Then deploy Calico default deny policy fordemo
namespace.# attach to PODs log streams kubectl -n demo logs -f centos kubectl -n demo logs -f netshoot # deploy staged default-deny policy demo/35-dns-policy/policy-staged-default-deny-calico.yaml # deploy Calico default deny policy which applies the same default deny rules to demo namespace as was used in a standard K8s policy in Calico OSS demo scenario kubectl apply -f demo/35-dns-policy/policy-default-deny-calico.yaml
Once default deny policy takes affect, both
centos
andnetshoot
instances, should not be able to reach the targeted resources. -
Deploy DNS policy to allow access to external resource
# deploy policy to allow kube-dns access kubectl apply -f demo/35-dns-policy/policy-allow-kube-dns.yaml # deploy network sets resources kubectl apply -f demo/35-dns-policy/global-netset.yaml kubectl apply -f demo/35-dns-policy/public-ip-netset.yaml # deploy DNS policy kubectl apply -f demo/35-dns-policy/policy-allow-external-dns-egress.yaml
Once the DNS policy is deployed,
netshoot
pod would not be able to communicate with thenginx
pod because the DNS policy does not explicitly define a rule to allow this communication. To fix this issue, either user Calico Enterprise Manager UI to add aPass
rule to the DNS policy or uncomment thePass
action in thedemo/35-dns-policy/policy-allow-external-dns-egress.yaml
file and re-deploy the policy. -
Test
www.google.com
access# curl www.google.com from centos POD kubectl -n demo exec -t $(kubectl get pod -l app=centos -n demo -o jsonpath='{.items[*].metadata.name}') -- curl -m3 -ILs http://www.google.com | grep -i http # curl www.google.com from netshoot POD kubectl -n demo exec -t $(kubectl get pod -l app=netshoot -n demo -o jsonpath='{.items[*].metadata.name}') -- curl -m3 -ILs http://www.google.com | grep -i http
Try to
curl
any other external resource. You should not be able to access it asallow-external-dns-egress
policy explicitly denies access to public IP ranges listed inpublic-ip-range
global network set.
For more details on how to configure SSH access to the nodes refer to the official AKS documentation.
# create ssh helper pod
kubectl run --generator=run-pod/v1 -it --rm aks-ssh --image=debian
# if cluster is mixed, use this to pin pod to Linux
kubectl run -it --rm aks-ssh --image=debian --overrides='{"apiVersion":"apps/v1","spec":{"template":{"spec":{"nodeSelector":{"beta.kubernetes.io/os":"linux"}}}}}'
# install ssh client in the helper pod
apt-get update && apt-get install openssh-client -y
# set vars
SSH_KEY='/path/to/ssh_key'
# open a new terminal and run
kubectl cp $SSH_KEY $(kubectl get pod -l run=aks-ssh -o jsonpath='{.items[0].metadata.name}'):/id_rsa
# get node IP to use within aks-ssh POD
NODE_NAME='aks_node_name'
echo "NODE_IP=$(kubectl get node $NODE_NAME -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')"
# run these commands inside of aks-ssh POD session
chmod 0600 id_rsa
ssh -i id_rsa azureuser@$NODE_IP