StackExchange/StackExchange.Redis

Error restarting Redis server nodes and causing high numbers of clients, freeze: (ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure)

Dunge opened this issue · 0 comments

I have a set of 1 Redis primary, 2 replicas and 3 sentinels. I have about 11 services connecting to the sentinels using the serviceName= connection string.

I did one action: I restart rolled out my redis instances (a few minutes in between each).

Most services continued to work correctly. 2 printed the errors printed below. But the biggest problem is that one of them seemed to has frozen a few hours later? I can't really explain why, but it triggered something accumulating over a long term? Restarting the redis instances is pretty much the only thing I can pinpoint that caused this service to freeze. Restarting the frozen service fixed it.

I updated StackExchange.Redis from 2.6.111 to 2.8.0 recently. I also updated Redis itself from 7.0.11 to 7.2.5. I don't know if it's related, but I didn't have this behavior before. But then I don't restart my redis services often either.

Now I have prometheuse metrics and the grafana redis dashboard installed. And one weird thing I noticed after this error is the number of "connected clients" started to rise dramatically. While it's usually hanging around 40, It reached over 6000! Network IO also seemed to follow the same pattern, althought the number is very low. I then restarted the services that did not seem to be affected, and that number dropped back to 40.

image
image

2024-07-16 20:07:43.127	System.AggregateException: One or more errors occurred. (SocketFailure (ReadSocketError/ConnectionReset, last-recv: 5) on 10.100.41.41:6379/Interactive, Idle/Faulted, last: ZRANGE, origin: ReadFromPipe, outstanding: 238, last-read: 2s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.8.0.27420) (ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure) (ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure) (ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure)
2024-07-16 20:07:43.128	 ---> StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 5) on 10.100.41.41:6379/Interactive, Idle/Faulted, last: ZRANGE, origin: ReadFromPipe, outstanding: 238, last-read: 2s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.8.0.27420
2024-07-16 20:07:43.128	 ---> Pipelines.Sockets.Unofficial.ConnectionResetException: Connection reset by peer
2024-07-16 20:07:43.128	 ---> System.Net.Sockets.SocketException (104): Connection reset by peer
2024-07-16 20:07:43.128	   at Pipelines.Sockets.Unofficial.Internal.Throw.Socket(Int32 errorCode) in /_/src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 59
2024-07-16 20:07:43.129	   at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.GetResult() in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 74
2024-07-16 20:07:43.129	   at Pipelines.Sockets.Unofficial.SocketConnection.DoReceiveAsync() in /_/src/Pipelines.Sockets.Unofficial/SocketConnection.Receive.cs:line 64
2024-07-16 20:07:43.129	   --- End of inner exception stack trace ---
2024-07-16 20:07:43.129	   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
2024-07-16 20:07:43.130	   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
2024-07-16 20:07:43.130	   at StackExchange.Redis.PhysicalConnection.ReadFromPipe() in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 1877
2024-07-16 20:07:43.130	   --- End of inner exception stack trace ---
2024-07-16 20:07:43.131	   at StackExchange.Redis.ServerEndPoint.<>c__DisplayClass43_0.<<OnConnectedAsync>g__IfConnectedAsync|0>d.MoveNext() in /_/src/StackExchange.Redis/ServerEndPoint.cs:line 130
2024-07-16 20:07:43.131	--- End of stack trace from previous location ---
2024-07-16 20:07:43.131	   at StackExchange.Redis.TaskExtensions.TimeoutAfter(Task task, Int32 timeoutMs) in /_/src/StackExchange.Redis/TaskExtensions.cs:line 50
2024-07-16 20:07:43.132	   at StackExchange.Redis.ConnectionMultiplexer.WaitAllIgnoreErrorsAsync(String name, Task[] tasks, Int32 timeoutMilliseconds, ILogger log, String caller, Int32 callerLineNumber) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 523
2024-07-16 20:07:43.132	   --- End of inner exception stack trace ---
2024-07-16 20:07:43.141	 ---> (Inner Exception #1) StackExchange.Redis.RedisConnectionException: ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure
2024-07-16 20:07:43.141	--- End of stack trace from previous location ---
2024-07-16 20:07:43.141	
2024-07-16 20:07:43.142	 ---> (Inner Exception #2) StackExchange.Redis.RedisConnectionException: ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure
2024-07-16 20:07:43.143	--- End of stack trace from previous location ---
2024-07-16 20:07:43.143	
2024-07-16 20:07:43.144	 ---> (Inner Exception #3) StackExchange.Redis.RedisConnectionException: ProtocolFailure on [0]:EXEC (TransactionProcessor), failure before write: WriteFailure
2024-07-16 20:07:43.144	--- End of stack trace from previous location ---

I omitted the stack traces inside my code calling redis operations.

Then I received the more "normal" timeout errors for a few minutes

2024-07-16 20:08:26.555	 ---> StackExchange.Redis.RedisConnectionException: The message timed out in the backlog attempting to send because no connection became available (35000ms) - Last Connection Exception: It was not possible to connect to the redis server(s). ConnectTimeout, command=EXEC, timeout: 35000, inst: 0, qu: 0, qs: 0, aw: False, bw: CheckingForTimeout, rs: NotStarted, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 1, async-ops: 26191435, serverEndpoint: 10.100.220.201:6379, conn-sec: n/a, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: datalogger-6cdb6db955-7swzm(SE.Redis-v2.8.0.27420), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=144,Free=32623,Min=4,Max=32767), POOL: (Threads=144,QueuedItems=225,CompletedItems=154316839,Timers=363), v: 2.8.0.27420 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
2024-07-16 20:08:26.555	 ---> StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). ConnectTimeout

And then it stopped printing errors, and everything seemed to work, until it froze a few hours later.