thsmi/sieve

WebApp websocket connection gets closed after 60 seconds on default nginx configuration

Smith4545 opened this issue · 2 comments

Prerequisites

  • Tried the most recent nightly build
  • Checked if your issue is already reported.
  • Answered all the questions in this template (Or provide a working crystal ball).

What happened?

This problem is explicitly related to the usage of nginx in conjunction with the WebApp!
On editing any Sieve-script via the WebApp, after not doing anything for a while, the user can't save his changes anymore.
This obviously shouldn't happen. No matter how long the user doesn't do anything, he should always be able to save his stuff.

The reason for this behaviour is, that nginx, by default, will close a connection to a proxied server after 60 seconds if the proxied server doesn't transmit any data in this timeframe.

The probable solutions to this problem are:

  • (the temporary way) include proxy_read_timeout 7200s; (or a timeout > than how long the user needs to work on his scripts) in the stated location block in the WebApp's README.md.
  • send data to the browser every once in a while.

What did you expect to happen?

No matter how long the user doesn't do anything, he should always be able to save his stuff.

Logs and Traces

2023-03-09 16:53:57 WARNING [handle_message] webserver.py : index out of range                                                                                                            
2023-03-09 16:53:57 WARNING [handle_message] webserver.py : Traceback (most recent call last):                                                                                            
  File "/opt/thsmi/sieve/sieve-0.6.1-web/script/webserver.py", line 65, in handle_message                                                                                                 
    handler.handle_request(context, request)
  File "/opt/thsmi/sieve/sieve-0.6.1-web/script/handler/websocket.py", line 48, in handle_request                                                                                         
    MessagePump().run(websocket, sievesocket)
  File "/opt/thsmi/sieve/sieve-0.6.1-web/script/messagepump.py", line 26, in run                                                                                                          
    data = server.recv()                                                                                                                                                                  
  File "/opt/thsmi/sieve/sieve-0.6.1-web/script/websocket.py", line 119, in recv                                                                                                          
    opcode = data[0] & 0b00001111
IndexError: index out of range

Which Version

  • WebApp on dfeeac10cb5cf65b08b31360229053bcdae50174
  • Server: Debian 11 with dovecot pidgeonhole
thsmi commented

Ok this is an interesting one.

The webapp tunnels sieve messages via a http websockets.

Regular sieve which (not tunneled via TCP) requires a server to keep a connection open for at least 30 minutes.
Because there where issues in the past and this the actual keep alive is way lower. It is currently at 5 minutes. Which looks like a good compromise. Because keep alive messages are rather heavy weight and sieve implementation tend to get upset if you fall below a certain threshold.

So to be safe with nginx default setting you need to be reduced to less than 30 seconds. Would be technically possible but it horribly scales. And it would generate lots of unnecessary load on the sieve server.

As far as I understood the nginx documentation a websocket ping does not extend the read timeout. So using a lightweight websocket based keep alive does not help.

So this basically splits down into three tasks.

  1. Update the Readme and add a note that you need to increase the timeout on nginx.
  2. Currently the timeout is hardcoded to 5 minutes. But this should be a parameter controlled by the admin. So that you can reduce it to 30 seconds or less if you are sure you infrastructure is ok with it.
  3. The WebApp currently lacks of a reconnect logic. It should automatically reconnect whenever the connection is lost.

I had to read that three times, but what I got out of that for the moment is:

  • the webapp sends a keep-alive through the proxied websocket to to the sieve server, which is hardcoded to happen every 5 minutes at the moment
  • proxy_read_timeout therefore can be reduced to something between 300 and 600 seconds (which is way better than my initial 7200s)

I'm testing with 600s and it seems to work for now.
Thank you for having a look into this!