Wrong kube-proxy mount bind propagation causes node certificates to not update itself and expire
hostops opened this issue · 1 comments
/kind bug
1. What kops
version are you running? The command kops version
, will display
this information.
Last applied server version: 1.25.3
2. What Kubernetes version are you running? kubectl version
will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops
flag.
Server Version: v1.25.5
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
# get /var/lib/kube-proxy/kubeconfig from kube-proxy container
kubectl exec -n kube-system kube-proxy-i-03fa9558373c958ff -- cat /var/lib/kube-proxy/kubeconfig | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout
# get /var/lib/kube-proxy/kubeconfig from node
ssh ubuntu@$(kubectl get node i-03fa9558373c958ff -o json | jq '.status.addresses[4].address' -r) "sudo cat /var/lib/kube-proxy/kubeconfig" | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout
5. What happened after the commands executed?
I get two different outputs for the same file.
# notAfter=Mar 26 16:39:07 2024 GMT
# notAfter=Aug 1 15:09:29 2024 GMT
6. What did you expect to happen?
I expected to get the same output since this should be the same file.
You can see that when you check volumes and volumeMounts of kube-proxy path
volumeMounts:
- mountPath: /var/lib/kube-proxy/kubeconfig
name: kubeconfig
readOnly: true
volumes:
- hostPath:
path: /var/lib/kube-proxy/kubeconfig
type: ""
name: kubeconfig
Also if you check container configuration using ctr
sudo ctr -n k8s.io container inspect <kube-config container id>
You can confirm this container uses the same file.
{
"destination": "/var/lib/kube-proxy/kubeconfig",
"type": "bind",
"source": "/var/lib/kube-proxy/kubeconfig",
"options": [
"rbind",
"rprivate",
"ro"
]
}
I also believe those lines caused the issue. Especialy rprivate
option.
From docker documentation one can see
Bind propagation refers to whether or not mounts created within a given bind-mount or named volume can be propagated to replicas of that mount.
rprivate The default. The same as private, meaning that no mount points anywhere within the original or replica mount points propagate in either direction.
So the default rprivate
option can cause unsynchronized state if we have multiple replicas of this mounts.
So another condition must be met for this issue to happen.
There must be multiple containers with the same mount.
This happens if node restarted and new kube-proxy pod is created.
I believe this is the default behaviour so kubernetes can get logs from previous container.
Even if only one the two containers is running/used this can happen.
I can tested this hypothesis by running this command sudo ctr -n k8s.io containers ls | grep proxy | wc -l
on all of our nodes.
On all nodes where we have multiple containers (2) we can see unsynchronized certificates.
Also I can confirm those are the only kube-proxy pods with restarts > 0.
And on nodes with only single kube-proxy container we can see files /var/lib/kube-proxy/kubeconfig from node and pod to be the same.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
spec:
api:
loadBalancer:
class: Classic
type: Public
authorization:
rbac: {}
awsLoadBalancerController:
enabled: true
certManager:
enabled: true
managed: false
channel: stable
cloudProvider: aws
iam:
allowContainerRegistry: true
legacy: false
kubeDNS:
provider: CoreDNS
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.25.5
networking:
calico: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
subnets:
topology:
dns:
type: Public
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
machineType: c6a.large
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-1a
role: Master
subnets:
- eu-west-1a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
machineType: m5a.xlarge
maxSize: 6
minSize: 5
nodeLabels:
kops.k8s.io/instancegroup: nodes
packages:
- nfs-common
role: Node
subnets:
- eu-west-1a
9. Anything else do we need to know?
So I believe this bug is caused by:
- Node restarted so extra container is created so kubernetes can show previous logs or something like that
kubectl logs -p
- Due to default mount option
rprivate
this caused/var/lib/kube-proxy/kubeconfig
to be unsynchronized between containers and node. - Node updated its kubernetes certificates and kubeconfig file.
- Changes did not propagate to kube-proxy container
- When old certificate still used in kube-proxy expired node got in not ready state
Logs from kube-proxy
factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Unauthorized
factory.go:134: failed to list *v1.Service: Unauthorized
Logs from kube-api-server
E0309 22:14:53.387051 10 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: c
urrent time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z, verifying certificate SN=123932699956712321412974984599202854160, SKID=, AKID= fail
ed: x509: certificate has expired or is not yet valid: current time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z]"
10. Possible solutions?
- Manually set mount propagation to shared
- Remove previous container after restart (but user looses option to
kubectl logs -p
) - Use other mount/volume option or manually synchronize file.
Also, the more I read, the more I am not sure rprivate
causes this behavior of unsynchronized file. Can you come up with any way we can confirm that?
Also, I found out updating certificates is not expected behavior.
#15970 (comment)
No, kops expects you to update nodes at least every 455 days.
So this issue probably does not make sense? Can someone confirm?