if sentinel redis master node is killed but not removed from sentinels list then redis connections will be broken
Closed this issue · 1 comments
Problem
If redis sentinel
master node
goes offline that way that it is not properly removed then next things happen:
- happens failover to new master what is correct
- previous
redis master
is downgraded toslave
All this happens at redis
side and all is good there.
Problem comes from the fact that this broken redis slave
is not in that case removed from the sentinels
list. Instead it actually get's flag s_down,slave
.
This flag is ignored by the library and what actually happens is that the library fails to promote new master as it still tries to connect to broken slave
and it errors.
Atleast it happened in our production Kubernetes cluster when one Kubernetes node/machine died totally. Library started to get errors no route to host
what makes totally sense. Only way to recover from it was to remove manually slave
from the sentinels
list from each separately using SENTINEL RESET
command.
Also in that case service can't be restarted as new instances will not come up because of the same reason. Although actually in that case one healthy master
and slave
existed and app could function normally.
Error happens here:
Line 308 in 402e619
Possible fix is to exclude all slave
s what has s_down
status. As far we tested, it fixed the issue.
Proposed fix:
#343
Thank you @mrkmrtns , I've merged the PR and will tag a new release shortly :)