[BUG] Running many pods on a Windows node at the same will lead to failures of CNI
ShiqianTao opened this issue · 6 comments
According to Azure/AKS#3612, the third issue should belong to Windows Container Networking.
What happened:
Running many pods on a Windows node at the same will lead to failures of CNI.
- Azure CNI failed to initialize key-value store of network plugin. E.g.,
E0413 13:28:59.458937 3596 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"f9090b91333f019d8b8ed793bf105436188bbe879bc1ba6c8de41fd534c53cc8\\\": plugin type=\\\"azure-vnet\\\" failed (add): Failed to initialize key-value store of network plugin: timed out locking store\"" pod="default/sample-bb44b9dff-kwwnk" podUID=e4dae002-ca09-4514-9c47-153c5dce79fd
What you expected to happen:
Node has sufficient resources and should run these pods without the above issues.
How to reproduce it:
Refer to Azure/AKS#3612
Orchestrator and Version (e.g. Kubernetes, Docker):
Kubernetes: 1.24.10
Containerd://1.6.14+azure
Operating System (Linux/Windows):
Windows
Kernel (e.g. uanme -a
for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion
for Windows):
10.0.17763.4131 (WinBuild.160101.0800)
Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]
@ShiqianTao yes we are aware of this issue and it happens when you scale up/down in higher churns. we are currently investigating this.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
@tamilmani1989 The CNI failures are very commonly observed. This issue shouldn't be closed just yet.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.