strimzi/strimzi-kafka-oauth

Token refresh doesn't seem to work properly

Serk72 opened this issue · 3 comments

Inital login will work as expected

021-09-08 21:18:37,517 DEBUG r-thread-2 s.o.OAuthBearerLoginModule Login succeeded; invoke commit() to commit it; current committed token count=0
2021-09-08 21:18:37,517 TRACE r-thread-2 s.o.OAuthBearerLoginModule Committing my token; current committed token count = 0
2021-09-08 21:18:37,517 DEBUG r-thread-2 s.o.OAuthBearerLoginModule Done committing my token; committed token count is now 1
2021-09-08 21:18:37,517 INFO  r-thread-2 gCredentialRefreshingLogin Successfully logged in.
2021-09-08 21:18:37,518 DEBUG r-thread-2 OAuthBearerRefreshingLogin Found expiring credential with principal 'service-account-kafka'.
2021-09-08 21:18:37,518 DEBUG r-thread-2 gCredentialRefreshingLogin [Principal=:service-account-kafka]: It is an expiring credential
2021-09-08 21:18:37,519 INFO  ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: Expiring credential re-login thread started.
2021-09-08 21:18:37,520 INFO  ount-kafka gCredentialRefreshingLogin [Principal=service-account-kafka]: Expiring credential valid from Wed Sep 08 21:18:37 UTC 2021 to Wed Sep 08 21:23:37 UTC 2021
2021-09-08 21:18:37,528 WARN  ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: Expiring credential expires at Wed Sep 08 21:23:37 UTC 2021, so buffer times of 60 and 300 seconds at the front and back, respectively, cannot be accommodated.  We will refresh at Wed Sep 08 21:22:47 UTC 2021.
2021-09-08 21:18:37,529 INFO  ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: Expiring credential re-login sleeping until: Wed Sep 08 21:22:47 UTC 2021

Which shows the token as expiring at Wed Sep 08 21:23:37 UTC 2021
The refresh will occur as shown here:

2021-09-08 21:22:47,165 DEBUG ount-kafka s.o.OAuthBearerLoginModule Login succeeded; invoke commit() to commit it; current committed token count=1
2021-09-08 21:22:47,165 TRACE ount-kafka s.o.OAuthBearerLoginModule Committing my token; current committed token count = 1
2021-09-08 21:22:47,165 DEBUG ount-kafka s.o.OAuthBearerLoginModule Done committing my token; committed token count is now 2
2021-09-08 21:22:47,166 TRACE ount-kafka s.o.OAuthBearerLoginModule Logging out my token; current committed token count = 2
2021-09-08 21:22:47,166 DEBUG ount-kafka s.o.OAuthBearerLoginModule Done logging out my token; committed token count is now 1
2021-09-08 21:22:47,166 TRACE ount-kafka s.o.OAuthBearerLoginModule Logging out my extensions
2021-09-08 21:22:47,166 DEBUG ount-kafka s.o.OAuthBearerLoginModule Done logging out my extensions
2021-09-08 21:22:47,166 DEBUG ount-kafka OAuthBearerRefreshingLogin Found expiring credential with principal 'service-account-kafka'.
2021-09-08 21:22:47,166 DEBUG ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: It is an expiring credential after re-login as expected
2021-09-08 21:22:47,166 INFO  ount-kafka gCredentialRefreshingLogin [Principal=service-account-kafka]: Expiring credential valid from Wed Sep 08 21:22:47 UTC 2021 to Wed Sep 08 21:27:47 UTC 2021
2021-09-08 21:22:47,166 WARN  ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: Expiring credential expires at Wed Sep 08 21:27:47 UTC 2021, so buffer times of 60 and 300 seconds at the front and back, respectively, cannot be accommodated.  We will refresh at Wed Sep 08 21:26:48 UTC 2021.
2021-09-08 21:22:47,167 INFO  ount-kafka gCredentialRefreshingLogin [Principal=:service-account-kafka]: Expiring credential re-login sleeping until: Wed Sep 08 21:26:48 UTC 2021

However the kafka client used will still stop working at the orignial expire time of Wed Sep 08 21:23:37 UTC 2021

This is with a akhq client: https://github.com/tchiotludo/akhq

using the following config with server urls and client ids removed:

bootstrap.servers: "server:9094"
          sasl.jaas.config: org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required auth.valid.issuer.uri="url" oauth.jwks.endpoint.uri="https://url/protocol/openid-connect/certs" oauth.username.claim="preferred_username" oauth.client.id="kafka" oauth.client.secret="secretz" oauth.token.endpoint.uri="https://url/protocol/openid-connect/token";
          sasl.login.callback.handler.class: io.strimzi.kafka.oauth.client.JaasClientOauthLoginCallbackHandler
          security.protocol: SASL_SSL
          sasl.mechanism: OAUTHBEARER
          ssl.truststore.location: /app/kafka-certs/truststore.jks
          ssl.truststore.password: ...
          ssl.keystore.location: /app/kafka-certs/keystore.jks
          ssl.keystore.password: ...
          ssl.key.password: ...

as far as i can tell this happens because this line should be != instead of ==
https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L364
or that something is very wrong with the default time ranges since this part of the log

2021-09-08 21:22:47,165 DEBUG ount-kafka s.o.OAuthBearerLoginModule Done committing my token; committed token count is now 2
2021-09-08 21:22:47,166 TRACE ount-kafka s.o.OAuthBearerLoginModule Logging out my token; current committed token count = 2
2021-09-08 21:22:47,166 DEBUG ount-kafka s.o.OAuthBearerLoginModule Done logging out my token; committed token count is now 1

seems to imply given the code in OAuthBearerLoginModule that the token that was just added is the one that gets removed.

@Serk72

Maybe the most obvious thing - is reauthentication enabled on your broker?

See https://github.com/strimzi/strimzi-kafka-oauth#enabling-the-re-authentication

The logs you've pasted show the client refreshing the token, but not the client performing re-authentication.
You would get more info on that in the broker log by enabling debug logging for kafka loggers.

Kafka Clients library has settings that control how much time before expiry should the client fetch a new token for authentication. The layer performing the client authentication to the authorization server is JAAS based (refreshing the token by client is called re-login) and is separate from the layer performing client - broker authentication or re-authentication (when client sends the access token to the broker for authentication). These two layers don't talk to each other very well - re-login runs on its determined schedule based on the token expiry time. As a result it can happen, if token expiry is short and the parameters controlling re-login are not adjusted that re-authentication grabs an old not-yet-refreshed token to re-authenticate. In which case, when the token expires, you may see the session terminate.

I can't comment on the OAuthBearerLoginModule.java , I never needed to delve this deep into it, and re-authentication tests I've been performing in the past seemed to work fine.

Our Access Key lifespan was set to only 5 min.
Increasing it to 1 hour seems it seems to stay working even after a few token refreshes would have to have happened.
So that works as a quick fix at least
I will try and debug what is happening in the Kafka Client for short token lifespans when I have time. Since I don't like at least not knowing why this can happen for short token lifetimes, and if its a race condition it might still be possible though less likely for long token lifetimes.