container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
BartoszZawadzki opened this issue · 3 comments
/kind bug
1. What kops
version are you running? The command kops version
, will display
this information.
Client version: 1.24.5 (git-v1.24.5)
2. What Kubernetes version are you running? kubectl version
will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops
flag.
Server Version: v1.24.17
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops validate cluster
5. What happened after the commands executed?
Validating cluster dev.k8s.sgr-cloud.sh
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
app-arm-eu-west-1a Node t4g.large 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1a
app-arm-eu-west-1b Node t4g.large 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1b
app-arm-eu-west-1c Node t4g.large 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1c
app-eu-west-1a Node t3a.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1a
app-eu-west-1b Node t3a.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1b
app-eu-west-1c Node t3a.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1c
bastions Bastion t3.small 1 1 dev.k8s.sgr-cloud.sh-public-eu-west-1a,dev.k8s.sgr-cloud.sh-public-eu-west-1b,dev.k8s.sgr-cloud.sh-public-eu-west-1c
ci-eu-west-1a Node t3a.2xlarge 0 10 dev.k8s.sgr-cloud.sh-private-eu-west-1a
gpu-eu-west-1a Node g5.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1a
gpu-eu-west-1b Node g5.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1b
gpu-eu-west-1c Node g5.2xlarge 0 15 dev.k8s.sgr-cloud.sh-private-eu-west-1c
infra-eu-west-1a Node t3a.2xlarge 1 3 dev.k8s.sgr-cloud.sh-private-eu-west-1a
infra-eu-west-1b Node t3a.2xlarge 1 3 dev.k8s.sgr-cloud.sh-private-eu-west-1b
infra-eu-west-1c Node t3a.2xlarge 1 3 dev.k8s.sgr-cloud.sh-private-eu-west-1c
master-eu-west-1a Master t3a.xlarge 1 2 dev.k8s.sgr-cloud.sh-private-eu-west-1a
master-eu-west-1b Master t3a.xlarge 1 2 dev.k8s.sgr-cloud.sh-private-eu-west-1b
master-eu-west-1c Master t3a.xlarge 1 2 dev.k8s.sgr-cloud.sh-private-eu-west-1c
NODE STATUS
NAME ROLE READY
i-0432daf1eb766a0df node False
i-0845aa28b742d61ce master False
i-0c4cd91b01124b439 master False
VALIDATION ERRORS
KIND NAME MESSAGE
Machine i-02409274b530a2a9b machine "i-02409274b530a2a9b" has not yet joined cluster
Machine i-02cb1df09a9ad18eb machine "i-02cb1df09a9ad18eb" has not yet joined cluster
Machine i-02f6d0de212e95cd8 machine "i-02f6d0de212e95cd8" has not yet joined cluster
Machine i-03a2b2086edb7c04f machine "i-03a2b2086edb7c04f" has not yet joined cluster
Machine i-0750fc13240a1869b machine "i-0750fc13240a1869b" has not yet joined cluster
Machine i-0773e1466ae9be609 machine "i-0773e1466ae9be609" has not yet joined cluster
Machine i-087147df3d0c7dfd8 machine "i-087147df3d0c7dfd8" has not yet joined cluster
Machine i-0a5ac944ae1926d6f machine "i-0a5ac944ae1926d6f" has not yet joined cluster
Machine i-0c6c938c57e1061fe machine "i-0c6c938c57e1061fe" has not yet joined cluster
Machine i-0ef86b3f204d977f3 machine "i-0ef86b3f204d977f3" has not yet joined cluster
Machine i-0fea2dbdbfd81409f machine "i-0fea2dbdbfd81409f" has not yet joined cluster
Node i-0432daf1eb766a0df node "i-0432daf1eb766a0df" of role "node" is not ready
Node i-0845aa28b742d61ce node "i-0845aa28b742d61ce" of role "master" is not ready
Node i-0c4cd91b01124b439 node "i-0c4cd91b01124b439" of role "master" is not ready
Validation Failed
Error: Validation failed: cluster not yet healthy
6. What did you expect to happen?
I expected validation to pass successfully
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10
flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
After describing the thee nodes (two master nodes and one worker node) I've noticed that they all show the same error:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 09 Apr 2024 13:27:47 +0200 Mon, 08 Apr 2024 16:16:07 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 09 Apr 2024 13:27:47 +0200 Mon, 08 Apr 2024 16:16:07 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 09 Apr 2024 13:27:47 +0200 Mon, 08 Apr 2024 16:16:07 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 09 Apr 2024 13:27:47 +0200 Mon, 08 Apr 2024 16:16:07 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
I've followed https://kops.sigs.k8s.io/operations/troubleshoot/:
- Nodeup doesn't show any issues:
Apr 08 14:56:10 ip-172-20-91-12 nodeup[1380]: success
Apr 08 14:56:10 ip-172-20-91-12 systemd[1]: kops-configuration.service: Succeeded.
Apr 08 14:56:10 ip-172-20-91-12 systemd[1]: Finished Run kOps bootstrap (nodeup).
kube-apiserver
shows multiple errors (log file attached)- Both
etcd.log
andetcd-events.log
don't show any errors kubelet
shows multiple errors:
"MESSAGE" : "E0409 10:41:11.892153 3762 kubelet.go:2352] \"Container runtime network not ready\" networkReady=\"NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\"",
"MESSAGE" : "E0409 10:41:12.275970 3762 kubelet.go:1693] \"Failed creating a mirror pod for\" err=\"Internal error occurred: failed calling webhook \\\"pod-identity-webhook.amazonaws.com\\\": failed to call webh
ook: Post \\\"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\\\": no endpoints available for service \\\"pod-identity-webhook\\\"\" pod=\"kube-system/etcd-manager-main-i-0845aa28b742d61ce\"",
This happened after performing kops rolling-update cluster --cloudonly
, beforoe that cluster was healthy.
I've also SSH'ed into one on the worker nodes that did not join the cluster and noticed that it failed at Nodeup:
Apr 09 12:58:42 ip-172-20-105-158 nodeup[54814]: I0409 12:58:42.014417 54814 executor.go:155] No progress made, sleeping before retrying 1 task(s)
Apr 09 12:58:52 ip-172-20-105-158 nodeup[54814]: I0409 12:58:52.023456 54814 executor.go:111] Tasks: 77 done / 85 total; 1 can run
Apr 09 12:58:52 ip-172-20-105-158 nodeup[54814]: I0409 12:58:52.023510 54814 executor.go:186] Executing task "BootstrapClientTask/BootstrapClient": BootstrapClientTask
Apr 09 12:58:55 ip-172-20-105-158 nodeup[54814]: W0409 12:58:55.102587 54814 executor.go:139] error running task "BootstrapClientTask/BootstrapClient" (1m38s remaining to succeed): Post "https://kops-controller.internal.dev.k8s.sgr-clXXX.XX:3988/bootstrap": dial tcp 172.20.103.227:3988: connect: no route to host
From what I can see in here: https://kops.sigs.k8s.io/contributing/ports/ 3988
is a kops controller serving port
;
This is what I get from kops-controller
Pod in kube-system
namespace:
Error from server: Get "https://172.20.103.227:10250/containerLogs/kube-system/kops-controller-p9zf5/kops-controller": dial tcp 172.20.103.227:10250: connect: no route to host
As it turned out the problem was with pod-identity-webhook
mutatingwebhookconfigurations.admissionregistration.k8s.io
- it had failurePolicy: Fail
and because we did kops rolling-update --cloudonly
, other Pods
didn't pass that webhook.
The reason the didn't pass is because the pod-idendity-webhook
wasn't yet up & running.
I've edited the webhook and set failurePolicy: Ignore
, then I waited a bit for all the Pods
to get running, and after that I've reverted the webhook to failurePolicy: Fail