ManageIQ/kubeclient

Kubernetes1.21 BoundServiceAccountToken Support

wills-feng opened this issue ยท 11 comments

We deployed Fluentd on eks1.21, recently we got an email notification that Fluentd is using stale token. In the email, it claimed Ruby Kubernetes client should have already taken this into account, but it seems not true in our environment. I have confirmed Fluentd is using the latest kubecilent(4.9.3).

AWS email attached below,

We have identified applications running in one or more of your Amazon EKS clusters that are not refreshing service account tokens. Applications making requests to Kubernetes API server with expired tokens will fail. You can resolve the issue by updating your application and its dependencies to use newer versions of Kubernetes client SDK that automatically refreshes the tokens.

What is the problem?

Kubernetes version 1.21 graduated BoundServiceAccountTokenVolume feature [1] to beta and enabled it by default. This feature improves security of service account tokens by requiring a one hour expiry time, over the previous default of no expiration. This means that applications that do not refetch service account tokens periodically will receive an HTTP 401 unauthorized error response on requests to Kubernetes API server with expired tokens. You can learn more about the BoundServiceAccountToken feature in EKS Kubernetes 1.21 release notes [2].

To enable a smooth migration of applications to the newer time-bound service account tokens, EKS v1.21+ extends the lifetime of service account tokens to 90 days. Applications on EKS v1.21+ clusters that make API server requests with tokens that are older than 90 days will receive an HTTP 401 unauthorized error response.

How can you resolve the issue?

To make the transition to time bound service account tokens easier, Kubernetes has updated the below official versions of client SDKs to automatically refetch tokens before the one hour expiration:

  • Go v0.15.7 and later
  • Python v12.0.0 and later
  • Java v9.0.0 and later
  • Javascript v0.10.3 and later
  • Ruby master branch
  • Haskell v0.3.0.0

We recommend that you update your application and its dependencies to use one of the above client SDK versions if you are on an older version.

I am having the same issue, the fluentd image we deployed have the kubecilent(4.9.3) installed shown at here. And we are still getting the messages in AWS cloudwatch similarly like https://docs.aws.amazon.com/eks/latest/userguide/service-accounts.html#identify-pods-using-stale-tokens

ashie commented

In the email, it claimed Ruby Kubernetes client should have already taken this into account, but it seems not true in our environment. I have confirmed Fluentd is using the latest kubecilent(4.9.3).

Probably it doesn't mean this gem (kubeclient), it means https://github.com/kubernetes-client/ruby.
Yes, kubernetes-client/ruby seems refreshe the tokens automatically:
https://github.com/kubernetes-client/ruby/blob/33861097e2eab9954b3f07f6898e5e8199199731/kubernetes/src/kubernetes/config/incluster_config.rb#L28

TOKEN_REFRESH_PERIOD = 60 # 1 minute

But it doesn't seem maintained well, last update is 1 year ago, and the latest gem version is 0.0.2 released at May 04, 2019.
This is the why above document doesn't describe Ruby's SDK version:

  • Ruby master branch
ashie commented

So probably we need to implement this feature to kubeclient.

ashie commented

So probably we need to implement this feature to kubeclient.

It seems that it's already implemented (but not configured by default).
https://github.com/ManageIQ/kubeclient#inside-a-kubernetes-cluster

When bearer_token_file is set, it's always reloaded on every connection.

elsif @auth_options[:bearer_token_file]
connection.request(:authorization, 'Bearer', lambda do
File.read(@auth_options[:bearer_token_file]).chomp
end)

Probably we should set /var/run/secrets/kubernetes.io/serviceaccount/token to bearer_token_file (default is nil).

cben commented

It's implemented on master branch, need to bring it to a releasable state and release 5.0...

One silly task that I've been postponing is going over all the changes in master absent in v4.y branch and compiling a CHANGELOG... Anyone wants to help with that?

cben commented

[Or you could also backport token reloading to v4.y, but that's not trivial cherry-pick โ€” master branch switched to Faraday (and the refresh relies on Faraday features), while v4.y uses rest-client...

Given my limited time for kubeclient :-(, I'm not gonna do such a backport myself, moving forward to 5.0 seems more productive, but if somebody else wants to do it I'm happy to review.]

@cben thanks for your reply. Do you have an estimated date for 5.0 release?

@cben @ashie , I have implemented one workaround that will refresh the token on each request. (splunk#1)

Can you review it? I can also raise PR here if it looks good.

Please see the following change:

#567

We made some tests and the issue still exists in WatchStream. Because of that I made a fix. Please review

Sorry, but I was wondering if there is an update or maybe even an ETA on this enhancement? Thanks.

cben commented

Released now in 4.10.0.
Thanks again to Harshit for the implementation + tests + diagnosing unrelated minitest issues that came up!