Server disconnects clients after 1.7mins when deployed on Aws behind load balancer and cloudflare

Question

Server disconnects clients after 1.7mins when deployed on Aws behind load balancer and cloudflare

andreaskyritsis opened this issue 2 years ago · 3 comments

I see a discrepancy in behavior between local and production deployment.

When deploying locally everything works as expected and clients connection are kept open forever (if client does not close eventsource).
When deploying on production on AWS behind kubernetes loadbalancer and cloudflare the connections are being closed after 1.7mins.

The behavior is the same even when using keepalive mode Always Never or BehindACNM. Also I have tried connecting over h3 protocol and still behavior is the same but instead of ERR_HTTP2_PROTOCOL_ERROR 200 I get net::ERR_QUIC_PROTOCOL_ERROR 200.

Any Idea why this is happening?

Answer 1 · 2022-10-19T19:53:17.000Z

Hi @andreaskyritsis,

Sadly, if there is a single non-transparent network component which doesn't handle long-living HTTP responses out-of-the-box, such problems can occur.

In the past I was involved in projects using SSE behind Cloudflare and special buffering configuration was always required. I don't know how it looks right now as I haven't been in any such project lately.

If I would be to give an advice on how to approach this issue, I would suggest removing as many variables as possible from the equation. If you have indeed exposed your service directly on load balancer and there is no ingress in play, then I would start testing directly against that (without Cloudflare).

If there are additional network components around the Kubernetes in AWS (firewalls, gateways, etc.), I would try to bypass those as well. Ideally you will reach situation where you will be adding a single component at the time to the stack and it will allow you to identify the problematic one.

Answer 2 · 2022-10-21T14:41:15.000Z

Hi @tpeczek I manage to overcome the issue by implementing keep alive on Event Source Level, by sending and empty message (or comment) every minute the connections were being kept open for hours. On top of that I can see that now the RAM usage got stable (though a bit high for the amount of clients/traffic). Before I got constantly increasing RAM which was resulting in periodically pod restart. Since the change no restart occurred.

Answer 3 · 2022-10-21T21:04:55.000Z

Hi @andreaskyritsis,

I'm happy to hear that you've solved your issue.

At the same time, I'm puzzled. The ServerSentEventsKeepaliveMode.Always should do exactly what you are describing (by sending comments). I've done some tests and it seems to be working as designed, so my best guess is that some intermediary is skipping the comments while passing the stream further. I base that guess on fact that I'm not exposing API to send comments, so what you do manually is probably sending events (if you could share a snippet of code, I could verify that). If that is the case, that would give priority to #46 so keep-alives can be configured to be events.

Regarding the increasing memory before you have resolved the issue, there might be two causes. Either I have some kind of memory leak when the connection terminates abnormally, or the connections weren't in fact terminated from the service perspective as some intermediary was still holding them (so the number would constantly grow). I will try to test for the first case.

Regarding the overall high memory, if you have any insights/metrics/data which could help me investigate I could take a look if it's something in the lib.