clue/reactphp-redis

Emit timeout error if socket connection dies silently

Opened this issue · 3 comments

clue commented

This project already supports detecting closed connections when receiving a close event. On top of this, to account for situations where the socket connection may die silently (e.g. due to power outage or network failure), we should also consider a connection dead if we don't receive any response to an outstanding request within a timeout period (default could be 600s?).

It's important to note that Redis employs request/response semantics, and the server is expected to send response messages in a timely manner. That being said, requests such as BLPOP with larger timeout values can take significantly longer.

Out of scope: Redis also allows sending regular heartbeat/ping messages to keep the connection alive if there is no activity for a certain time, but we employ an idle connection time for this case anyway (see #130 / #118).

Refs clue/reactphp-eventsource#37, #132 and others

We welcome contributions, reach out if you want to support this project or become a sponsor ❤️

Hi @clue ! 👋

I have an issue with Freddie where a short Redis outage (connection down then up a few ms afterwards) would not be detected on a hanging SUBSCRIBE connection. As a result, the LazyClient still thinks it's connected, although it no longer is - as a consequence, all subsequent published messages won't be dropped in the pipe.
PING doesn't help here as the new connection will PONG back as if nothing happened.

A convenient solution would be to throw an exception that $self emits close, end or error within this block, so that subscriber knows the redis connection was closed:

$redis->on('close', function () use (&$pending, $self, &$subscribed, &$psubscribed, &$idleTimer, $loop) {
$pending = null;
// foward unsubscribe/punsubscribe events when underlying connection closes
$n = count($subscribed);
foreach ($subscribed as $channel => $_) {
$self->emit('unsubscribe', array($channel, --$n));
}
$n = count($psubscribed);
foreach ($psubscribed as $pattern => $_) {
$self->emit('punsubscribe', array($pattern, --$n));
}
$subscribed = array();
$psubscribed = array();
if ($idleTimer !== null) {
$loop->cancelTimer($idleTimer);
$idleTimer = null;
}
});

What do you think?

Thanks!
Ben

cc @misaert

Update: from what I understand, the Redis connection may be closed at any time so that the client goes to an "idle" state, that can be awaken later. It's just not supposed to happen when a SUBSCRIBE command has been engaged, so that solution would not be convenient to detect that the connection silently died.

In the end I don't know how to implement this 🤷
Any ideas?

PS: At the moment, the only (crappy) solution I came up with to detect silent disruptions during SUBSCRIBE is to access the underlying Redis Client via Reflection, and hook to a close event.

@bpolaszek Interesting case you have there 👍

I have an issue with Freddie where a short Redis outage (connection down then up a few ms afterwards) would not be detected on a hanging SUBSCRIBE connection. As a result, the LazyClient still thinks it's connected, although it no longer is - as a consequence, all subsequent published messages won't be dropped in the pipe.

Well this actually makes sense, if the connection dies silently there's nobody there to inform the client, so it will remain in the SUBSCRIBE state. Additionally the part that nobody informs the client is not completely true, because when the underlying connection is lost, the unsubscribe and punsubscribe events will be invoked automatically, so we can actually listen on these events and take action (see the PubSub chapter). For example, the subscribe example shows a re-subscribe implementation when receiving a unsubscribe event. @clue also explained this in #120.

So if your Redis instance silently dies, it's obvious that the previously used connection is broken, but we can start creating a new one once Redis is back.

PING doesn't help here as the new connection will PONG back as if nothing happened.

This is actually an interesting behavior, because when sending your PING you're not reusing the old connection (because broken), you're actually creating a new one. The documentation for the LazyClient reads:

"[...] Internally, it lazily creates the underlying database connection only on demand once the first request is invoked on this instance"

This means you're automatically creating a new connection when sending out the PING and this is why you're receiving a PONG.

I think this should answer your question. I'm not quite sure if this is really related to the timeout error topic above, so I'll mark our conversation as "off topic" for now. If you're still encountering issues, we can also open a separate ticket for your case and have a closer look.