Helm-controller pod is using stale tokens
albertschwarzkopf opened this issue ยท 17 comments
Hi,
the "Bound Service Account Token Volume" is graduated to stable and enabled by default in Kubernetes version 1.22.
I am using "helm-controller:v0.21.0" in AWS EKS 1.22 and I have checked, if it is using stale tokens (regarding https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html and https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html#troubleshooting-boundservicetoken).
So when the API server receives requests with tokens that are older than one hour, then it annotates the pod with "annotations.authentication.k8s.io/stale-token". In my case I can see the following annotation. E.g.:
"annotations":{"authentication.k8s.io/stale-token":"subject: system:serviceaccount:flux-system:helm-controller, seconds after warning threshold: 56187"
Version:
helm-controller:v0.21.0
Cluster Details
AWS EKS 1.22
Steps to reproduce issue
- Enable EKS Audit Logs
- Query CW Insights (select cluster log group):
fields @timestamp
| filter @message like /seconds after warning threshold/
| parse @message "subject: *, seconds after warning threshold:*\"" as subject, elapsedtime
@albertschwarzkopf can you please confirm this happens with kustomize-controller also?
@stefanprodan thanks for the fast reply!
No helm-controller only.
kustomize-controller is running in version 0.25.0
Also no issue with notification-controller:v0.23.5 and source-controller:v0.24.4
Does kustomize-controller runs on the the same node as helm-controller? Can you please post here kubectl -n flux-system get pods -owide
I see that kustomize-controller was restarted recently, wait one hour and report back please if kustomize-controller runs into the same issue. I'm trying to figure out if this is something specific to helm-controller or is a general problem with Kubernetes client-go on EKS.
Relates to fluxcd/flux2#2074
I see that kustomize-controller was restarted recently, wait one hour and report back please if kustomize-controller runs into the same issue. I'm trying to figure out if this is something specific to helm-controller or is a general problem with Kubernetes client-go on EKS.
After 72 minutes no issue with kustomize-controller...
I've created an EKS cluster:
$ kubectl version
Server Version: v1.22.6-eks-14c7a48
I've waited one hour:
$ kubectl -n flux-system get po
NAME READY STATUS RESTARTS AGE
helm-controller-88f6889c6-pwf7f 1/1 Running 0 73m
kustomize-controller-784bd54978-bckm6 1/1 Running 0 73m
notification-controller-648bbb9db7-58c2d 1/1 Running 0 73m
source-controller-79f7866bc7-k25z5 1/1 Running 0 73m
And there is no stale-token
annotation on the pod:
$ kubectl -n flux-system get po helm-controller-88f6889c6-pwf7f -oyaml
apiVersion: v1
kind: Pod
metadata:
annotations:
container.seccomp.security.alpha.kubernetes.io/manager: runtime/default
kubernetes.io/psp: eks.privileged
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: "2022-05-10T10:08:59Z"
generateName: helm-controller-88f6889c6-
labels:
app: helm-controller
pod-template-hash: 88f6889c6
name: helm-controller-88f6889c6-pwf7f
namespace: flux-system
@albertschwarzkopf can you give the first mentioned image in #480 a try, and if that does not yield results, the second?
@hiddeco thanks! I have tried both images today. Image ghcr.io/hiddeco/helm-controller:head-412201a has worked like expected only. So I cannot see the mentioned annotation in the audit logs even after 1 hour.
Thanks for confirming. I'll finalize the PR in that case, and make sure it is included in next release.
Note we even got an automated email about this from aws!
As of April 20th 2022, we have identified the below service accounts attached to pods in one or more of your EKS clusters using stale (older than 1 hour) tokens. Service accounts are listed in the format : |namespace:serviceaccount
arn:aws:eks:eu-west-2::cluster/prod-|kube-system:multus
arn:aws:eks:eu-west-2::cluster/prod-**|flux-system:helm-controller
This also totally explains fluxcd/flux2#2074 ( and the correlation between multus + helm we saw )
Got same message from AWS. Only helm-controller
SA was flagged. All controllers are running for the same period of time.
NAME READY STATUS RESTARTS AGE
helm-controller-5676d55dff-7lgvn 1/1 Running 0 16d
image-automation-controller-6444ccb58c-8xcls 1/1 Running 0 16d
image-reflector-controller-f64677dd5-974qs 1/1 Running 0 16d
kustomize-controller-76f9d4f99f-htp8d 1/1 Running 0 16d
notification-controller-846fff6d67-h677q 1/1 Running 0 16d
source-controller-55d799ff7d-w598g 1/1 Running 0 16d
We got the notification message from AWS as well, but just for the helm-controller
, albeit all pods are up and running 85 days long
I can confirm same problem here on EKS v1.22.6-eks-7d68063
. Not sure if it's interesting or related, but, after moving to EKS 1.22
authentication for client changed from client.authentication.k8s.io/v1alpha1
to client.authentication.k8s.io/v1beta1
.
As already mentioned in #479 (comment). We have identified the issue, staged a patch, and this will be solved on next release.