SentinelConnector set KnownNodes empty during redis failover, which causes app could not access redis any more
Opened this issue · 8 comments
Bug Report
Current Behavior
In redis sentinel mode, lettuce may refresh topology nodes to empty, if the connection to redis sentinel closed just after lettuce received TopologyRefreshMessage. And that will cause lettuce cannot recover anymore until received next TopologyRefreshMessage.
We have two redis nodes, and redis is running in sentinel mode.
Assume the redis nodes' names are redis1 and redis2, and redis1 is master at the beginning, we inject errors as below:
- ifconfig down network at redis1, and wait the redis failover to redis2 finished
- ifconfig up network at redis1, and wait the redis failover to redis1 again
- ifconfig down network at redis2
At step3, redis sentinel send +sdown sentinel ...
to lettuce client, and trigger lettuce executing method SentinelConnector::getTopologyRefreshRunnable
. While at the same time, the connection between lettuce client and redis sentinel is closed, and then error occured.
lettuce logs below
[WARN] io.lettuce.core.RedisClient:605 Cannot connect Redis Sentinel at redis://*:*: java.util.concurrent.CompletionException: io.netty.channel.StacklessClosedChannelException
[WARN] io.lettuce.core.masterreplica.SentinelConnector:117 Topology refresh returned no nodes from redis-sentinel://**@*:*,*:*?sentinelMasterId=mymaster
And since that, the app could not handle any read and write operation because the knownNodes is empty.
org.springframework.data.redis.redisSystemException: Redis exception; nested exception is io.lettuce.core.RedisException: Cannot determine a node to read (Known nodes: []) while setting io.lettuce.core.ReadFromImpl$ReadFromUpstreamPreferred
org.springframework.data.redis.redisSystemException: Redis exception; nested exception is io.lettuce.core.RedisException: Master is currently unknown: []
By reviewing the git log, we've found this problem existed since commit Do not fail Master/Slave topology refresh if a node is not available. And it still exists at branch master;
When Requests#getRequest
return null
or future.isDone()
returns false, it'll clear MasterReplicaConnectionProvider#knownNodes
and the lettuce client could not get recovered untill receiving next TopologyRefreshMessage, or the lettuce client process was restarted.
src/main/java/io/lettuce/core/masterreplica/Requests.java
protected void onEmit(Emission<List<RedisNodeDescription>> emission) {
List<RedisNodeDescription> result = new ArrayList<>();
Map<RedisNodeDescription, Long> latencies = new HashMap<>();
for (RedisNodeDescription node : nodes) {
TimedAsyncCommand<String, String, String> future = getRequest(node.getUri());
if (future == null || !future.isDone()) {
// if this expression is true for all nodes, it'll clear MasterReplicaConnectionProvider#knownNodes
continue;
}
RedisNodeDescription redisNodeDescription = findNodeByUri(nodes, node.getUri());
latencies.put(redisNodeDescription, future.duration());
result.add(redisNodeDescription);
}
SortAction sortAction = SortAction.getSortAction();
sortAction.sort(result, new LatencyComparator(latencies));
emission.success(result);
}
Environment
- Lettuce version(s): 6.1.10.RELEASE
- Redis version: 6.2.7
- Spring Boot version: 2.7.18
- JRE version: 11.0.8
Possible Solution
We've checked the lettuce code, the code in SentinelConnector::getTopologyRefreshRunnable
may be improved, to avoid setting empty list to knownNodes.
src/main/java/io/lettuce/core/masterreplica/SentinelConnector.java
private Runnable getTopologyRefreshRunnable(MasterReplicaTopologyRefresh refresh,
MasterReplicaConnectionProvider<K, V> connectionProvider) {
return () -> {
try {
LOG.debug("Refreshing topology");
refresh.getNodes(redisURI).subscribe(nodes -> {
EventRecorder.getInstance().record(new MasterReplicaTopologyChangedEvent(redisURI, nodes));
if (nodes.isEmpty()) {
LOG.warn("Topology refresh returned no nodes from {}", redisURI);
return; // Add return here, to avoid setKnownNodes with empty nodes.
}
LOG.debug("New topology: {}", nodes);
connectionProvider.setKnownNodes(nodes);
}, t -> LOG.error("Error during background refresh", t));
} catch (Exception e) {
LOG.error("Error during background refresh", e);
}
};
}
Here is the modification to reproduce this issue.
Stack trace
[INFO] Running io.lettuce.core.masterreplica.MyTest
2024-10-18 17:11:17 [ERROR] [main] TEST: getNodes called count 1 (MasterReplicaTopologyRefresh:60)
2024-10-18 17:11:18 [ERROR] [lettuce-epollEventLoop-4-1] TEST: refresh getNodes initialNodes [RedisMasterReplicaNode [redisURI=redis://localhost:6482?timeout=10s, role=UPSTREAM], RedisMasterReplicaNode [redisURI=redis://localhost:6483?timeout=10s, role=REPLICA]] (MasterReplicaTopologyRefresh:65)
2024-10-18 17:11:19 [ERROR] [lettuce-epollEventLoop-4-2] TEST: nodeList after ping [RedisMasterReplicaNode [redisURI=redis://localhost:6483?timeout=10s, role=REPLICA], RedisMasterReplicaNode [redisURI=redis://localhost:6482?timeout=10s, role=UPSTREAM]] (MasterReplicaTopologyRefresh:85)
2024-10-18 17:11:19 [ERROR] [main] TEST: +sdown 1 (MyTest:47)
2024-10-18 17:11:24 [ERROR] [lettuce-eventExecutorLoop-1-4] TEST: getNodes called count 2 (MasterReplicaTopologyRefresh:60)
2024-10-18 17:11:24 [ERROR] [lettuce-epollEventLoop-4-6] TEST: refresh getNodes initialNodes [RedisMasterReplicaNode [redisURI=redis://localhost:6482?timeout=10s, role=UPSTREAM], RedisMasterReplicaNode [redisURI=redis://localhost:6483?timeout=10s, role=REPLICA]] (MasterReplicaTopologyRefresh:65)
2024-10-18 17:11:24 [ERROR] [lettuce-epollEventLoop-4-8] TEST: inject requestPing failing (MasterReplicaTopologyRefresh:75)
2024-10-18 17:11:24 [ERROR] [lettuce-epollEventLoop-4-8] TEST: Requests node redis://localhost:6482?timeout=10s set future to null (Requests:63)
2024-10-18 17:11:24 [ERROR] [lettuce-epollEventLoop-4-8] TEST: Requests node redis://localhost:6483?timeout=10s set future to null (Requests:63)
2024-10-18 17:11:24 [ERROR] [lettuce-epollEventLoop-4-8] TEST: nodeList after ping [] (MasterReplicaTopologyRefresh:85)
2024-10-18 17:11:24 [WARN ] [lettuce-epollEventLoop-4-8] Topology refresh returned no nodes from redis-sentinel://127.0.0.1,127.0.0.1:26380?sentinelMasterId=mymaster&timeout=10s (SentinelConnector:126)
2024-10-18 17:11:29 [ERROR] [main] TEST: +sdown 2 (MyTest:50)
2024-10-18 17:11:30 [ERROR] [main] TEST: +sdown 3 (MyTest:53)
2024-10-18 17:11:34 [ERROR] [lettuce-eventExecutorLoop-1-1] TEST: getNodes called count 3 (MasterReplicaTopologyRefresh:60)
2024-10-18 17:11:34 [ERROR] [lettuce-epollEventLoop-4-9] TEST: refresh getNodes initialNodes [RedisMasterReplicaNode [redisURI=redis://localhost:6482?timeout=10s, role=UPSTREAM], RedisMasterReplicaNode [redisURI=redis://localhost:6483?timeout=10s, role=REPLICA]] (MasterReplicaTopologyRefresh:65)
2024-10-18 17:11:34 [ERROR] [lettuce-epollEventLoop-4-10] TEST: inject requestPing failing (MasterReplicaTopologyRefresh:75)
2024-10-18 17:11:34 [ERROR] [lettuce-epollEventLoop-4-10] TEST: Requests node redis://localhost:6482?timeout=10s set future to null (Requests:63)
2024-10-18 17:11:34 [ERROR] [lettuce-epollEventLoop-4-10] TEST: Requests node redis://localhost:6483?timeout=10s set future to null (Requests:63)
2024-10-18 17:11:34 [ERROR] [lettuce-epollEventLoop-4-10] TEST: nodeList after ping [] (MasterReplicaTopologyRefresh:85)
2024-10-18 17:11:34 [WARN ] [lettuce-epollEventLoop-4-10] Topology refresh returned no nodes from redis-sentinel://127.0.0.1,127.0.0.1:26380?sentinelMasterId=mymaster&timeout=10s (SentinelConnector:126)
2024-10-18 17:11:40 [ERROR] [main] TEST: known nodes [] (MyTest:58)
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.33 s <<< FAILURE! -- in io.lettuce.core.masterreplica.MyTest
[ERROR] io.lettuce.core.masterreplica.MyTest.refreshEmptyNodes -- Time elapsed: 24.42 s <<< ERROR!
io.lettuce.core.RedisException: Master is currently unknown: []
at io.lettuce.core.masterreplica.MasterReplicaConnectionProvider.getMaster(MasterReplicaConnectionProvider.java:305)
at io.lettuce.core.masterreplica.MyTest.refreshEmptyNodes(MyTest.java:60)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
Hey @jiajunsu ,
Thanks for the verbose analysis. Before we jump into the way the Lettuce driver works let's first analyse the scenario and environment you are working with and why sentinel returns an empty list of nodes in the first place.
How many sentinel processes are you using to test this scenario? Can you describe your complete environment?
- With 2 sentinel, 1 upstream and 1 replica nodes running.
- The empty list of nodes is not replied by sentinel, I've tested that possibility, and found that empty-list could not be responsed from sentinel(even if sentinel response empty list, lettuce code could not reach the WARN log
Topology refresh returned no nodes
).
code in src/main/java/io/lettuce/core/masterreplica/MasterReplicaTopologyRefresh.java
public Mono<List<RedisNodeDescription>> getNodes(RedisURI seed) {
CompletableFuture<List<RedisNodeDescription>> future = topologyProvider.getNodesAsync();
Mono<List<RedisNodeDescription>> initialNodes = Mono.fromFuture(future).doOnNext(nodes -> {
applyAuthenticationCredentials(nodes, seed);
});
return initialNodes.map(this::getConnections) // if initialNodes is empty, map `getConnections` will be skipped
.flatMap(asyncConnections -> asyncConnections.asMono(seed.getTimeout(), eventExecutors))
...
}
And the empty list was made from connections.requestPing()
. The detail is:
- sentinel send
+sdown
to lettuce client. - lettuce try to get nodes list from sentinel and success.
- lettuce try to
PING
the nodes, and at the same time, the redis nodes are unreachable from network. - at last we get an empty list in
SentinelConnector#getTopologyRefreshRunnable
This could be reproduced by running MyTest
in my forked repo and commit. Just make the test environment by make prepare
and make start
, and the testcase use the sentinel in lettuce unittest environment running with port 26379
and 26380
, while the upstream-replica ports are 6482
and 6483
.
Oh, okay, let me address these one at the time:
With 2 sentinel, 1 upstream and 1 replica nodes running.
This is not a recommended setup for Redis Sentinel as you have to have at least 3 sentinel nodes to achieve master promotion and avoid split brain scenarios. See Example 1 and the explanation why this is a broken setup.
The empty list of nodes is not replied by sentinel, I've tested that possibility, and found that empty-list could not be responsed from sentinel(even if sentinel response empty list, lettuce code could not reach the WARN log Topology refresh returned no nodes).
My bad, confused topology refresh with master/replica refresh. There are two types of topology refresh (check the SentinelConnector:connectAsync() method):
- MasterReplicaTopologyRefresh - handles the connection to the current master node and monitors if a replica node is promoted to a master node
- SentinelTopologyRefresh - handles the overall nodes in the current Sentinel topology (all masters and replicas)
And the empty list was made from connections.requestPing(). The detail is:
- sentinel send +sdown to lettuce client.
- lettuce try to get nodes list from sentinel and success.
- lettuce try to PING the nodes, and at the same time, the redis nodes are unreachable from network.
- at last we get an empty list in SentinelConnector#getTopologyRefreshRunnable
I think you are simulating an invalid state of the system, which the driver could not (reasonably) handle. The fix in #3019 is also not valid, because we will end up with a list of nodes that we know we cant connect to. At the moment the refresh is happening, there is basically no valid node to connect to, as the masters returned from the NODES
command are not returning a reply to our PING
. The only thing the driver could do is wait until the next refresh attempt. What else could it do? If it attempts to send commands to the last known nodes they are sure to fail, as they are not reachable.
At this point I think this is an invalid scenario to consider.
This is not a recommended setup for Redis Sentinel as you have to have at least 3 sentinel nodes to achieve master promotion and avoid split brain scenarios. See Example 1 and the explanation why this is a broken setup.
Sorry for the misleading response. In prod, we do have 3 sentiniel nodes. The 2 running sentinel is in my test env for reproducing this issue.
At the moment the refresh is happening, there is basically no valid node to connect to, as the masters returned from the NODES command are not returning a reply to our PING. The only thing the driver could do is wait until the next refresh attempt. What else could it do? If it attempts to send commands to the last known nodes they are sure to fail, as they are not reachable.
IMHO, a better way may could be:
- If some nodes(not all nodes) do not reply to
PING
, we could remove them fromKnownNodes
. Possibly those nodes are fault. - If all nodes do not reply to
PING
, we could just do nothing rather than clear theKnownNodes
list. Possibly it's not the redis nodes' problem, may the lettuce process node's network being broken. After we clearKnownNodes
, if the sentinel don't send any psub message, the lettuce client could not get recovered ever.
I just compared the lettuce features between redis-cluster and redis-sentinel, maybe we could support enablePeriodicRefresh
feature in sentinel mode, just like what the cluster mode do. That could handle this scenario in a better way.
Sorry for the misleading response. In prod, we do have 3 sentiniel nodes. The 2 running sentinel is in my test env for reproducing this issue.
I understand. I do - however - feel this would not happen if you use 3 sentinel nodes.
Have you tried reproducing the problem this way?
IMHO, a better way may could be:
- If some nodes(not all nodes) do not reply to
PING
, we could remove them fromKnownNodes
. Possibly those nodes are fault.- If all nodes do not reply to
PING
, we could just do nothing rather than clear theKnownNodes
list. Possibly it's not the redis nodes' problem, may the lettuce process node's network being broken. After we clearKnownNodes
, if the sentinel don't send any psub message, the lettuce client could not get recovered ever.
- We are in agreement about step 1, this is also how the logic should be working right now
- Again - the question here is WHY is there a situation where no nodes are returned, and I believe this is an invalid state of the system. I disagree we should leave the existing
knownNodes
as they are. If we do that it would mean that we assume that theknownNodes
would eventually come up - and we have no way of knowing that.
Here is the modification to reproduce this issue.
This test scenario simply simulates what would happen if a PING
command fails. I am not sure why the Sentinel would return a node that is unreachable. Could you verify that somehow? Could it be that the master is never promoted?
I just compared the lettuce features between redis-cluster and redis-sentinel, maybe we could support
enablePeriodicRefresh
feature in sentinel mode, just like what the cluster mode do. That could handle this scenario in a better way.
Assuming we add this periodic refresh - which node should we ask for what type of information?
Currently we depend on the Sentinel to push data about all master nodes, and all master nodes to give us master/replica status. If we are to believe your research the Sentinel pushes a topology refresh, giving us a master node and when we attempt to connect to it we get a list of nodes that we can't PING.
I do - however - feel this would not happen if you use 3 sentinel nodes.
Have you tried reproducing the problem this way?
The original case was happend in our production environment, with 3 sentinel nodes. And I tried to reproduce it by dropping down the replica and 2 sentinel nodes, it didn't happen in about 100 times. That means the trigger condition is too extream to reproduce in real environment. So I have to inject exceptions to lettuce code and test it in unittest env.
WHY is there a situation where no nodes are returned, and I believe this is an invalid state of the system. I disagree we should leave the existing knownNodes as they are. If we do that it would mean that we assume that the knownNodes would eventually come up - and we have no way of knowing that.
Because I failed to reproduce it in real env, and without debug logs, it's hard to get the detail reason why PING
command fails. After reading and testing the lettuce code, it seems to be just one possible cause that could make that WARN log Topology refresh returned no nodes
. Considering the network latency between lettuce client and redis sentinel/master, a likely scenario maybe the network between lettuce and sentinel/master was down just after the psub message being sent from sentinel. But it's just a guess.
I also checked the redis sentinel logs, and it records the psub messages was sent at that time.
redis sentinel logs
This log is from sentinel-node3, at the same available zone with the master node
22:10:51.299 # +sdown sentinel sentinel-node1 @ mymaster
22:10:51.371 # +sdown slave slave-node @ mymaster
22:10:51.587 # +sdown sentinel sentinel-node2 @ mymaster
I agree that leaving knownNodes
with unreachable data is not a good idea. Clearing knownNodes
and waiting for next psub message to refresh the topology data is not a good choice either. So maybe supporting periodicRefresh
is a better choice.
Assuming we add this periodic refresh - which node should we ask for what type of information?
We could make it the same way as the callback function for psub message, SentinelConnector#getTopologyRefreshRunnable
, getting master and slave nodes from sentinel. Differ from just receiving psub message from sentiniel passively, we could refresh
knownNodes
periodically. That means, if that extream scenario happened, the lettuce client could get to work just after the network recovered, without doing anything by SRE(like restarting the client process, that is what we have to do in this case).
This is a fair point. We could improve the resiliency of the driver if we schedule some action in this particular scenario.
I think we could model it in a similar manner to the ClusterTopologyRefreshScheduler
.
This is no trivial task however, I will have to think on it's priority given our backlog and given the fact it is a bit of an exotic issue. Meanwhile contributions are welcome in that direction.