Wrong kube-proxy mount bind propagation causes node certificates to not update itself and expire

Question

Wrong kube-proxy mount bind propagation causes node certificates to not update itself and expire

hostops opened this issue 3 months ago · 1 comments

/kind bug

1. What kops version are you running? The command kops version, will display
this information.
Last applied server version: 1.25.3

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: v1.25.5
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?

# get /var/lib/kube-proxy/kubeconfig from kube-proxy container
kubectl exec -n kube-system kube-proxy-i-03fa9558373c958ff -- cat /var/lib/kube-proxy/kubeconfig | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout
# get /var/lib/kube-proxy/kubeconfig from node
ssh ubuntu@$(kubectl get node i-03fa9558373c958ff -o json | jq '.status.addresses[4].address' -r) "sudo cat /var/lib/kube-proxy/kubeconfig" | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout

5. What happened after the commands executed?
I get two different outputs for the same file.

# notAfter=Mar 26 16:39:07 2024 GMT
# notAfter=Aug  1 15:09:29 2024 GMT

6. What did you expect to happen?
I expected to get the same output since this should be the same file.
You can see that when you check volumes and volumeMounts of kube-proxy path

volumeMounts:
- mountPath: /var/lib/kube-proxy/kubeconfig
  name: kubeconfig
  readOnly: true
volumes:
- hostPath:
    path: /var/lib/kube-proxy/kubeconfig
    type: ""
  name: kubeconfig

Also if you check container configuration using ctr
sudo ctr -n k8s.io container inspect <kube-config container id>
You can confirm this container uses the same file.

            {
                "destination": "/var/lib/kube-proxy/kubeconfig",
                "type": "bind",
                "source": "/var/lib/kube-proxy/kubeconfig",
                "options": [
                    "rbind",
                    "rprivate",
                    "ro"
                ]
            }

I also believe those lines caused the issue. Especialy rprivate option.
From docker documentation one can see

Bind propagation refers to whether or not mounts created within a given bind-mount or named volume can be propagated to replicas of that mount.
rprivate The default. The same as private, meaning that no mount points anywhere within the original or replica mount points propagate in either direction.

So the default rprivate option can cause unsynchronized state if we have multiple replicas of this mounts.
So another condition must be met for this issue to happen.
There must be multiple containers with the same mount.
This happens if node restarted and new kube-proxy pod is created.
I believe this is the default behaviour so kubernetes can get logs from previous container.
Even if only one the two containers is running/used this can happen.
I can tested this hypothesis by running this command sudo ctr -n k8s.io containers ls | grep proxy | wc -l on all of our nodes.
On all nodes where we have multiple containers (2) we can see unsynchronized certificates.
Also I can confirm those are the only kube-proxy pods with restarts > 0.
And on nodes with only single kube-proxy container we can see files /var/lib/kube-proxy/kubeconfig from node and pod to be the same.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
spec:
  api:
    loadBalancer:
      class: Classic
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    enabled: true
    managed: false
  channel: stable
  cloudProvider: aws
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeDNS:
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.25.5
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  topology:
    dns:
      type: Public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
  machineType: c6a.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
  machineType: m5a.xlarge
  maxSize: 6
  minSize: 5
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  packages:
  - nfs-common
  role: Node
  subnets:
  - eu-west-1a

9. Anything else do we need to know?
So I believe this bug is caused by:

Node restarted so extra container is created so kubernetes can show previous logs or something like that kubectl logs -p
Due to default mount option rprivate this caused /var/lib/kube-proxy/kubeconfig to be unsynchronized between containers and node.
Node updated its kubernetes certificates and kubeconfig file.
Changes did not propagate to kube-proxy container
When old certificate still used in kube-proxy expired node got in not ready state

Logs from kube-proxy

factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Unauthorized
factory.go:134: failed to list *v1.Service: Unauthorized

Logs from kube-api-server

E0309 22:14:53.387051      10 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: c
urrent time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z, verifying certificate SN=123932699956712321412974984599202854160, SKID=, AKID= fail
ed: x509: certificate has expired or is not yet valid: current time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z]"

10. Possible solutions?

Manually set mount propagation to shared
Remove previous container after restart (but user looses option to kubectl logs -p)
Use other mount/volume option or manually synchronize file.

Also, the more I read, the more I am not sure rprivate causes this behavior of unsynchronized file. Can you come up with any way we can confirm that?

Answer 1 · 2024-03-12T08:40:10.000Z

Also, I found out updating certificates is not expected behavior.
#15970 (comment)

No, kops expects you to update nodes at least every 455 days.

So this issue probably does not make sense? Can someone confirm?