Bug: Reconciling VirtualNetworksSubnet fails with "Request entity too large: limit is 3145728"
Closed this issue ยท 10 comments
Describe the bug
The bug manifests on our cluster created with the following networking parameters:
az aks show --subscription ExampleSubscription -n example-cluster-name -g example-cluster-name-rg -o table --query networkProfile
NetworkPlugin NetworkPolicy NetworkDataplane ServiceCidr DnsServiceIp OutboundType LoadBalancerSku PodLinkLocalAccess
--------------- --------------- ------------------ ------------- -------------- -------------- ----------------- --------------------
azure azure azure 10.100.0.0/16 10.100.0.10 loadBalancer standard IMDS
And it has 20 Agent Pools, with the following sizes:
az aks show --subscription ExampleSubscription -n example-cluster-name -g example-cluster-name-rg -o table --query "agentPoolProfiles[].{Count: count, maxCount: maxCount, maxPods: maxPods}"
Count MaxCount MaxPods
------- ---------- ---------
0 3 20
5 7 150
0 3 80
2 5 110
0 50 100
36 60 100
27 100 100
12 20 100
11 33 110
1 4 110
3 8 80
0 0 100
5 10 100
2 7 100
4 30 100
5 30 100
0 3 20
15 30 100
3 3 20
2 7 80
CAPZ created a VirtualNetworksSubnet
ASO CR for that cluster with the following configuration:
az network vnet subnet show --ids "example/subnet/id" -o table --query "{addressPrefix: addressPrefix, privateEndpointNetworkPolicies: privateEndpointNetworkPolicies, privateLinkServiceNetworkPolicies: privateLinkServiceNetworkPolicies}"
AddressPrefix PrivateEndpointNetworkPolicies PrivateLinkServiceNetworkPolicies
--------------- -------------------------------- -----------------------------------
10.0.0.0/16 Disabled Enabled
When the AgentPools reach somewhere close to the "counts" above, the VirtualNetworksSubnet
object in azure grows in size to around 5.6mb, if fills up with thousands of entries in the ipConfigurations
field:
az network vnet subnet show --ids /subscriptions/.../subnets/example-cluster-subnet > example-cluster-subnet.json
ls -lh example-cluster-subnet.json
-rw-r--r--@ 1 danilo.uipath staff 5.6M Nov 4 12:37 example-cluster-subnet.json
cat example-cluster-subnet.json| jq '.ipConfigurations | length'
14006
cat example-cluster-subnet.json| jq '.ipConfigurations[0].id | length'
305
cat example-cluster-subnet.json| jq '.ipConfigurations[0].resourceGroup | length'
60
ASO then tries to persist the ipConfigurations
into the VirtualNetworksSubnet
CR's status and this causes the api server to return:
E1107 10:21:00.621890 1 generic_reconciler.go:143] "msg"="Failed to commit object to etcd" "error"="updating example-ns/example-cluster-name-vnet-example-cluster-name-subnet resource: Request entity too large: limit is 3145728" "logger"="controllers.VirtualNetworksSubnetController" "name"="example-cluster-name-example-cluster-name-subnet" "namespace"="example-ns"
Azure Service Operator Version: v2.8.0
Expected behavior
The VirtualNetworksSubnet
to continue reconciling successfuly for any scalable size of my Agent Pools.
To Reproduce
Create a VirtualNetworksSubnet
CR for an Azure Cloud Subnet with a large number of ipConfigurations
and wait for the controller to attempt to sync it.
Additional context
This issue relates to another issue in the CAPZ project kubernetes-sigs/cluster-api-provider-azure#4649
Can you share what the spec for the subnet looks like, as managed by CAPZ?
I think the issue we've got here is the fact that there are 14k entries for the ipConfigurations
field (which Azure allows), but at some point you cross the Kubernetes boundary for max resource size.
There is also a max resource size boundary for Azure I believe, but I think it's 4mb not 1.5mb which AFAIK is the default on Kubernetes.
Can you share what the spec for the subnet looks like, as managed by CAPZ?
AMCP resource:
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
spec:
virtualNetwork:
cidrBlock: 10.0.0.0/16
name: example-cluster-vnet
resourceGroup: example-cluster-rg
subnet:
cidrBlock: 10.0.0.0/16
name: example-cluster-subnet
serviceEndpoints:
- locations:
- '*'
service: Microsoft.Sql
- locations:
- '*'
service: Microsoft.KeyVault
- locations:
- '*'
service: Microsoft.Storage
- locations:
- '*'
service: Microsoft.AzureCosmosDB
- locations:
- '*'
service: Microsoft.ServiceBus
- locations:
- '*'
service: Microsoft.EventHub
And the Subnet it creates:
apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
spec:
addressPrefix: 10.0.0.0/16
addressPrefixes:
- 10.0.0.0/16
azureName: example-cluster-subnet
owner:
name: example-cluster-vnet
serviceEndpoints:
- locations:
- '*'
service: Microsoft.Sql
- locations:
- '*'
service: Microsoft.KeyVault
- locations:
- '*'
service: Microsoft.Storage
- locations:
- '*'
service: Microsoft.AzureCosmosDB
- locations:
- '*'
service: Microsoft.ServiceBus
- locations:
- '*'
service: Microsoft.EventHub
I looked at this some more and I think this comes down to a mismatch between the allowed max size of an Azure resource (which is I think somewhere in the 4mb range) and the allowed max size of a Kubernetes resource, which is ~1.5mb.
Since we fundamentally cannot fit this much data into etcd, there's not really much we can do here other than elide the .status.ipConfigurations
after some maximum length. The only thing that makes me feel any better about that is the fact that it's probably not practically possible to really use a list of 14000 ipConfiguration ARM IDs for anything anyway.
@nojnhuh - is CAPZ using .status.ipConfigurations
for anything right now?
@nojnhuh - is CAPZ using
.status.ipConfigurations
for anything right now?
It is not, so however you handle that should work for CAPZ.
Hey @matthchr, thanks so much for looking into this. Irt the etcd limit, the problem seems to manifest in different ways depending on the size of the object in Azure. Note that in the original ticket I opened in CAPZ, the error was different and it came from etcd:
E0315 17:13:54.206966 1 controller.go:329] "msg"="Reconciler error" "error"="updating mynamespace/examplecluster-vnet-examplecluster-subnet resource status: etcdserver: request is too large" "logger"="controllers" "name"="examplecluster-vnet-examplecluster-subnet" "namespace"="examplenamespace" "reconcileID"="..."
In that case, also note that the Subnet was not as large, when the error was observed, the subnet size was around 2.9mb.
Now the subnet object in Azure reached around 5.6mb and the error seems to come from the Kubernetes API server itself, this limit is hardcoded in more than on place, e.g. here.
I think in this case the object did not reach etcd.
Thanks @danilo404 - I suppose a more precise phrasing of the problem is not so much etcd but: Azure allows larger resources than Kubernetes. I think once the etcd limit is crossed it won't work in k8s, though I didn't know about the hardcoded apsierver limit that ends up giving a different error if the request gets large enough.
In terms of plan to fix this, it didn't make 2.11.0 (which has already shipped). I think we can try getting a fix merged before most of us go on holiday, which could enable consumption of the fix via the experimental release, but official release will probably need to wait until next year. There's also the added wrinkle of CAPZ using a slightly older version of ASO which may delay uptake in vanilla CAPZ as well.
Unfortunately I don't really see a workaround for this problem other than "keep the cluster small" in the meantime, though possibly this issue isn't actually breaking things severely if CAPZ isn't trying to update the subnet?
Can you share what the impact is to you @danilo404, and if you have any workaround to it currently?
Thanks for the update @matthchr. We don't have workarounds for this case, but the impact for now is not blocking. What happens now is that the CAPZ object AzureManagedControlPlane
reconcile loop tries to sync the Subnetwork's status (even without changes to the spec) and the CAPI/CAPZ Cluster stays in a Failed
state in Kubernetes, but the cluster itself in Azure is healthy. In any case the experimental release would be really useful, because the AMCP in 'failed' state causes other headaches, like the Flux orchestration that is unable to progress, and related alerts' silencing etc.
Ok the experimental build should have a fix for this now @danilo404.