[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes.
Opened this issue · 0 comments
Describe the bug
socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes.
To Reproduce
- Async client is used with cluster mode. Redis cluster is used with 3 masters and 3 slaves.
- generate continuous traffic
- during the traffic, select a master redis instance, apply the following iptables rule in the container/vm of that server:
iptables -A OUTPUT -p tcp --sport redis_port -s redis_ip -j DROP - kill that redis server (kill -9 redis_server_pid) [-> new master election will happen for that slot range]
Expected behavior
After socket_timeout, redis-plus-plus discover the new elected master / broken connection, traffic is redirected to that master
Unexpected result: old connection is used, continuous TCP packet retransmissions, no response to users for the given slot range for several minutes
Environment:
OS: Rocky Linux 8.2-20.el8.0.1
Compiler: gcc version 8.5.0
hiredis version: hiredis 1.2.0
redis-plus-plus version: 1.3.12
Additional context
Correction proposal:
- introduce a new property per connection: last_response_received
- failure detection: whenever a request is sent in a connection, check if (t_now - last_response_received) is under socket_timeout + TOLERATION (some milliseconds).
- when the failure is detected, be careful with CLUSTER SLOTS, do not target the problematic master instance, select another (I see issues like that)
- check if mastership was changed, and reconnect to the new master if needed. Abort the ongoing requests towards the redis-plus-plus user, redis-plus-plus user should retransmit the request, maybe with some delay.
As a workaround, we started guard timer and reset the AsyncRedisClient at timeouts to force redis-plus-plus to discover the mastership changes / broken links. But this solution also caused some issues, see:
#577
#578