Azure/azure-container-networking

[BUG] Running many pods on a Windows node at the same will lead to failures of CNI

ShiqianTao opened this issue · 6 comments

According to Azure/AKS#3612, the third issue should belong to Windows Container Networking.

What happened:
Running many pods on a Windows node at the same will lead to failures of CNI.

  • Azure CNI failed to initialize key-value store of network plugin. E.g.,
E0413 13:28:59.458937    3596 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"f9090b91333f019d8b8ed793bf105436188bbe879bc1ba6c8de41fd534c53cc8\\\": plugin type=\\\"azure-vnet\\\" failed (add): Failed to initialize key-value store of network plugin: timed out locking store\"" pod="default/sample-bb44b9dff-kwwnk" podUID=e4dae002-ca09-4514-9c47-153c5dce79fd

What you expected to happen:
Node has sufficient resources and should run these pods without the above issues.

How to reproduce it:
Refer to Azure/AKS#3612

Orchestrator and Version (e.g. Kubernetes, Docker):
Kubernetes: 1.24.10
Containerd://1.6.14+azure

Operating System (Linux/Windows):
Windows

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):
10.0.17763.4131 (WinBuild.160101.0800)

Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]

@ShiqianTao yes we are aware of this issue and it happens when you scale up/down in higher churns. we are currently investigating this.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@tamilmani1989 The CNI failures are very commonly observed. This issue shouldn't be closed just yet.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

Issue closed due to inactivity.