StackExchange/StackExchange.Redis

Transaction task never completes after connection blip

jlao opened this issue · 4 comments

jlao commented

Hello, I left a comment on #2630 but it was already closed so I'm creating a new issue. We recently saw this happen again. From my previous post which still holds:

We saw another instance where a Redis transaction is hung forever after a connection blip. The version we are using is 2.7.17.27058.

In the logs, I see this sequence of events:

07:20:22.719 - <log line right before we do the transaction>
07:20:22.990 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 4) on 10.10.70.233:6001/Interactive, Flushed/Faulted, last: GET, origin: ReadFromPipe, outstanding: 2, last-read: 4s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:23.012 - ConnectionRestored with args: 10.10.70.233:6001
07:20:26.558 - ConfigurationChanged with endpoint: 10.10.70.233:6001
07:20:30.259 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.232:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.231:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.269 - ConnectionRestored with args: 10.10.70.233:6003
07:20:30.269 - ConnectionRestored with args: 10.10.70.232:6002
07:20:30.269 - ConnectionRestored with args: 10.10.70.231:6003
07:20:30.270 - ConnectionRestored with args: 10.10.70.232:6001
07:20:30.270 - ConnectionRestored with args: 10.10.70.233:6002
07:20:30.275 - StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
	 ---> Pipelines.Sockets.Unofficial.ConnectionResetException: An existing connection was forcibly closed by the remote host.
	 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
	   at Pipelines.Sockets.Unofficial.Internal.Throw.Socket(Int32 errorCode) in /_/src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 59
	   at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.GetResult() in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 74
07:20:30.292 - ConfigurationChanged with endpoint: 10.10.70.232:6001

This is what the transaction code looks like:

ITransaction transaction = Database.CreateTransaction();
transaction.AddCondition(
	Condition.StringEqual(key1, 124));

Task<bool> task1 = transaction.KeyExpireAsync(
	key1,
	TimeSpan.FromMinutes(30));

Task<bool> task2 = transaction.KeyExpireAsync(
	key2,
	TimeSpan.FromMinutes(30));

Task<bool> task3 = transaction.KeyExpireAsync(
	key3,
	TimeSpan.FromMinutes(30));

Task<bool> task4 = transaction.KeyExpireAsync(
	key4,
	TimeSpan.FromMinutes(30));

if (!await transaction.ExecuteAsync())
{
	throw new Exception();
}

bool exp1 = await task1;
bool exp2 = await task2;
bool exp3 = await task3;
bool exp4 = await task4;

_logger.Verbose("{0} | {1} | {2} | {3}", exp1, exp2, exp3, exp4);

I took a process dump and when I open it I see:

  • A task is stuck on transaction.ExecuteAsync().
    • This task's status is WaitingForActivation
  • task1, task2, task3, and task4 are all Cancelled.
  • transaction appears to be null suggesting it was garbage collected since it went out of scope.

Some notes:

  • We do NOT recreate the ConnectionMultiplexer.
  • I can see in the process dump that we still have a reference to the ConnectionMultiplexer that we created originally.

I was able to gather some new information from the latest instance of this issue. I took a process dump and I noticed 36 async logical stack traces in PhysicalConnection.ReadFromPipe. I took a look at each one and in every case PhysicalConnection.TransactionActive was false. My best guess is that the transaction task was orphaned somehow.

Any thoughts on this?

Similar issue encountered with calls to StringGetWithExpiryAsync - some of the calls just hang after the server experiences a short spike of timeouts