Timeout exception conecting Azure Cache for Redis from AppService
llopezalonso opened this issue · 5 comments
We are experiencing timeouts with Redis when 10k request are sent by an appService to Redis.
Azure resources:
- Azure Cache for Redis is Premium P2
- AppService P2V3 (2 instances)
AppService Code:
- NET 8
- Using package StackExchange.Redis v2.7.33
- ThreadPool.SetMinThreads(256, 256);
Error shown in AppInsights:
Timeout awaiting response (outbound=67648KiB, inbound=328KiB, 5250ms elapsed, timeout is 5000ms), command=HGET, next: HGET MICLAVEBBDD, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 65536, last-in: 14161, cur-in: 8719, sync-ops: 0, async-ops: 10008, serverEndpoint: XXXX.redis.cache.windows.net:6380, conn-sec: 385.31, aoc: 0, mc: 1/1/0, mgr: 9 of 10 available, clientName: wnRURURU0000D9(SE.Redis-v2.8.0.27420), IOCP: (Busy=0,Free=1000,Min=256,Max=1000), WORKER: (Busy=21,Free=32746,Min=256,Max=32767), POOL: (Threads=80,QueuedItems=835,CompletedItems=20361,Timers=17), v: 2.8.0.27420 (Please take a look at this article for some common client-side issues that can cause timeouts:
https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Any thoughts that can help us??
Ton of thanks
We have the same challenges on the same stack. Running on Azure App Service with .NET 6 (and now 8) and REDIS. After almost 7 years now, we made the following learnings:
- CPU pressure is real. When our AVG CPU goes above 80%, we start noticing REDIS issues.
- Setup Private Endpoints to by-pass SNAT throttling. Connect to your REDIS, SQL, CosmosDB etc with a private endpoint. Without a private endpoint, your app will communicatie through the public network and that is shared, so you are limited. Check SNAT port exchaustion in the "diagnose and solve problems" tab.
- We applied the "best practices" from here. We even use that "simple" retry handling, because the pre-V8 Polly library added a lot of CPU and memory overhead as we benchmarked it with benchmarkdotnet.
Some low hanging fruits:
- Big keys - instead of requesting 10000's of keys in a MGET, we started to do "paged" MGET's
- Make sure your REDIS instance isn't too busy with evicting keys.
Ton of thanks for your response and your insights.
Regarding your comments:
- We do not see CPU pressure in appService (with 10k request, our appservice is about 40% of CPU use)
- We are currently using private endpoints to access Redis (public access is not enabled)
- We do not handle retry, our code to connect Redis is quite simple (we have follow this post)
Regarding the other comments:
- We are making 10k requests through Azure Load Testing, the appservice code retrieves one result through HashGetAsync so I cant apply pagination :(
- EvictedKeys metric is always 0
Again, ton of thanks for your comments and ideas, we are stuck on this problem :(
What kind of sends are involved here? On the outbound we see 67648KiB
which is quite a bit in queue on the ounbound side of the socket - is this storing very large keys?
Thanks for your response @NickCraver
No, we are not storing large keys (aprox 13k values in the database and larger is 15kb)
@llopezalonso The outbound was 67648KiB, so even assuming 15KiB upper bound that's around 4,700 outbound keys being stored at once as a spike in traffic - does that sound like the intended behavior or fail a gut check?