snowdrop/k8s-infra

ax51-nvme: kubectl get nodes reports x509: certificate has expired or is not yet valid

Closed this issue · 15 comments

Issue

We cannot anymore access the Kubernetes cluster on hetzner's VM - ax51-nvm

pass hetzner/ax51-nvme/ansible_user
root

pass hetzner/ax51-nvme/ansible_ssh_host
195.201.87.126

ssh root@195.201.87.126
alias k=kubectl
CentOS-77-64-minimal:~$ k get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-06-20T10:31:49+02:00 is after 2022-05-17T12:52:17Z

This is confirmed using the following kubeadm command

kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

W0620 10:38:07.551570   23086 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 May 17, 2022 12:53 UTC   <invalid>                               no
apiserver                  May 17, 2022 12:52 UTC   <invalid>       ca                      no
apiserver-etcd-client      May 17, 2022 12:52 UTC   <invalid>       etcd-ca                 no
apiserver-kubelet-client   May 17, 2022 12:52 UTC   <invalid>       ca                      no
controller-manager.conf    May 17, 2022 12:52 UTC   <invalid>                               no
etcd-healthcheck-client    May 17, 2022 12:51 UTC   <invalid>       etcd-ca                 no
etcd-peer                  May 17, 2022 12:51 UTC   <invalid>       etcd-ca                 no
etcd-server                May 17, 2022 12:51 UTC   <invalid>       etcd-ca                 no
front-proxy-client         May 17, 2022 12:52 UTC   <invalid>       front-proxy-ca          no
scheduler.conf             May 17, 2022 12:52 UTC   <invalid>                               no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 16, 2029 14:47 UTC   7y              no
etcd-ca                 Dec 16, 2029 14:47 UTC   7y              no
front-proxy-ca          Jan 19, 2031 11:55 UTC   8y              no

Solution

I suggest to manually renew the certificate https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal

WDYT ?

Certificate has been renewed

CentOS-77-64-minimal:~$ kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jun 20, 2023 09:27 UTC   364d                                    no

NOTE: We should certainly think about having an automatic rotation of the certificate every year OR to extend the EOF date f the certificate on this cluster ? WDYT ? @jacobdotcosta

The certificates are renewed but now kubelet isn't starting.

failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

Since the bootstrap-kubelet.conf should be used only when /etc/kubernetes/kubelet.conf doesn't exist, and /etc/kubernetes/kubelet.conf does exist, fixed it with cp /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf

NOTE: We should certainly think about having an automatic rotation of the certificate every year OR to extend the EOF date f the certificate on this cluster ? WDYT ? @jacobdotcosta

Can you create a new ticket concerning this point as we should take care ? @jacobdotcosta

Some resources aren't accesible yet.

Although the POD are up the snowdrop site and the team report pages aren't available.

The POD ans services seem correct.

$ kubectl -n snowdrop-site get all
NAME                                         READY   STATUS    RESTARTS   AGE
pod/snowdrop-site-angular-774dd56856-bqvjh   1/1     Running   0          399d
pod/spring-boot-generator-6587865b98-rdglj   1/1     Running   0          399d

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/snowdrop-site-angular   ClusterIP   10.100.108.193   <none>        80/TCP    2y183d
service/spring-boot-generator   ClusterIP   10.106.170.102   <none>        80/TCP    2y162d

Although it's not possible to fetch the logs.

$ kubectl -n snowdrop-site logs -f snowdrop-site-angular-774dd56856-bqvjh Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)

Using log level 10 on the request shows an 500 error on the fetch.

curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" 'https://xxx.xxx.xxx.xxx:6443/api/v1/namespaces/snowdrop-site/pods/snowdrop-site-angular-774dd56856-bqvjh/log?follow=true'
I0620 15:04:13.199083    2704 round_trippers.go:444] GET https://xxx.xxx.xxx.xxx:6443/api/v1/namespaces/snowdrop-site/pods/snowdrop-site-angular-774dd56856-bqvjh/log?follow=true 500 Internal Server Error in 4 milliseconds

Node is in state NotReady

$ kubectl get nodes
NAME                   STATUS     ROLES    AGE      VERSION
centos-77-64-minimal   NotReady   master   2y184d   v1.19.11
$ kubectl describe node centos-77-64-minimal
...
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Mon, 17 May 2021 11:36:17 +0200   Mon, 17 May 2021 11:36:17 +0200   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Mon, 20 Jun 2022 11:09:04 +0200   Mon, 20 Jun 2022 12:03:30 +0200   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Mon, 20 Jun 2022 11:09:04 +0200   Mon, 20 Jun 2022 12:03:30 +0200   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Mon, 20 Jun 2022 11:09:04 +0200   Mon, 20 Jun 2022 12:03:30 +0200   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Mon, 20 Jun 2022 11:09:04 +0200   Mon, 20 Jun 2022 12:03:30 +0200   NodeStatusUnknown   Kubelet stopped posting node status.
# journalctl -xef -u kubelet
...
Jun 20 18:37:07 CentOS-77-64-minimal kubelet[2206]: I0620 18:37:07.571210    2206 kubelet_node_status.go:71] Attempting to register node centos-77-64-minimal
Jun 20 18:37:07 CentOS-77-64-minimal kubelet[2206]: E0620 18:37:07.573224    2206 kubelet_node_status.go:93] Unable to register node "centos-77-64-minimal" with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope
...
API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope

So the problem is related to API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope. It seems that the certificate used by kubelet is associated with the anonymous's role and not the admin's role

The kubelet pem file is outdated and unusble after the certificate renovation.

# ll /var/lib/kubelet/pki
total 32
drwxr-xr-x 2 root root 4096 Jun 20  2021 ./
drwx------ 9 root root 4096 May 17  2021 ../
-rw------- 1 root root 2778 Dec 19  2019 kubelet-client-2019-12-19-15-47-45.pem
-rw------- 1 root root 1131 Dec 19  2019 kubelet-client-2019-12-19-15-48-12.pem
-rw------- 1 root root 1131 Sep  2  2020 kubelet-client-2020-09-02-09-34-13.pem
-rw------- 1 root root 1082 Jun 20  2021 kubelet-client-2021-06-20-11-14-20.pem
lrwxrwxrwx 1 root root   59 Jun 20  2021 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2021-06-20-11-14-20.pem
-rw-r--r-- 1 root root 2245 Dec 19  2019 kubelet.crt
-rw------- 1 root root 1675 Dec 19  2019 kubelet.key

Solution is perhaps to perform this

You need to provide --kubelet-client-certificate=<path_to_cert> and --kubelet-client-key=<path_to_key> to your apiserver, this way apiserver authenticate the kubelet with the certficate and key pair.
BK_DATE_STR=20220621
cd /etc/kubernetes/pki/
ls -la
mkdir _bk_${BK_DATE_STR}
mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} _bk_${BK_DATE_STR}/
ls -la
kubeadm init phase certs all --apiserver-advertise-address <API SERVER IP ADDRESS>
ls -la
cd /etc/kubernetes/
mkdir _bk_${BK_DATE_STR}
mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} _bk_${BK_DATE_STR}/
kubeadm init phase kubeconfig all
ls -la
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

As a result, the node is noew Ready.

$ kc get nodes
NAME                   STATUS   ROLES    AGE      VERSION
centos-77-64-minimal   Ready    master   2y184d   v1.19.11