[Bug]: Deadlock in StreamingCredentialsProvider: calling listeners under RWMutex while go-redis unsubscribes on re-auth failure
Closed this issue · 3 comments
Describe the bug
First of all, thank you for maintaining this library it is been really helpful.
I noticed a case where Redis stopped responding as expected and some client requests started timing out. After some investigation, I found that the issue comes from a deadlock in StreamingCredentialsProvider.
The problem happens because listener callbacks (OnNext / OnError) are invoked while holding RLock. If re-auth fails, go-redis may close the connection and trigger the provider's unsubscribe, which then tries to acquire Lock on the same RWMutex. Since RWMutex doesn't allow upgrading a read lock to a write lock, this leads to a deadlock.
As a result, token refresh can hang indefinitely, blocking re-auth updates and potentially the client.
To Reproduce
- Provider receives a new token and calls
onTokenNext. onTokenNextacquiresRLockand invokeslistener.OnNext(t).ReAuthCredentialsListener.OnNextcalls re-auth; on error it triggersonAuthenticationErr.onAuthenticationErrcloses the connection (e.g. bad conn).Conn.Close()triggers the provider'sunsubscribe.unsubscribetries to acquireLock, whileRLockis still held.- Deadlock occurs.
Expected behavior
Token refresh should fail gracefully and allow retry/recovery without hanging the provider.
Code Example
func (e *entraidCredentialsProvider) onTokenNext(t *token.Token) {
e.rwLock.RLock()
listeners := append([]auth.CredentialsListener(nil), e.listeners...)
e.rwLock.RUnlock()
for _, l := range listeners {
l.OnNext(t)
}
}
func (e *entraidCredentialsProvider) onTokenError(err error) {
e.rwLock.RLock()
listeners := append([]auth.CredentialsListener(nil), e.listeners...)
e.rwLock.RUnlock()
for _, l := range listeners {
l.OnError(err)
}
}
// This avoids calling external listeners while holding internal locks.Go Version
1.23.4
Package version
v1.0.4
go-redis Version
v9.9.0
Redis Server Version
6.0.14
Additional environment details
- The issue is not OS-specific (purely about lock semantics).
- Observed with the current implementation of
credentials_provider.goin this repo.
Additional context
Happy to contribute a PR with the proposed changes and tests if this approach looks good.
Hello @alex-poliushkin, thank you for reporting this. I will take a look shortly.
@alex-poliushkin thank you for reporting once again, the issue should be fixed in the latest patch release.
Great, thanks for fixing it!