Handling panics & timeouts

Question

Handling panics & timeouts

bkolobara opened this issue 2 years ago · 3 comments

Currently the process handling the request (middleware & handler) is the one holding the TcpStream, so if it fails (panics) the browser will not get a response.

We should introduce a "supervisor" process. This process wouldn't be of the Supervisor type as it wouldn't have generic behaviour.

I would spawn it as a AbstractProcess.
It should set host::api::process::die_when_link_dies(1) in the init method and spawn a linked sub process.
Both the sub-process (handler) and the "supervisor" should hold onto the TcpStream.
The sub-process should use the stream to parse the incoming request data.
Once the Response data is available, it should send it as a message to the supervisor.
The supervisor then should write the Response to the TcpStream.
If at any time the sub-process should fail (link breaks and handle_link_trapped gets called), the supervisor should return a 500 Internal Server Error.
We could also use the newly added send_after function, so that the supervisor can send a timeout message to itself. If the sub-process doesn't finish in lets say 60 sec, the supervisor should write back to the browser Request Timed Out and panic so it kills the linked sub-process too.
We could also use this mechanism to put memory and compute limitations on the sub-process by spawning it with a specific configuration (ProcessConfig)

Answer 1 · 2022-06-20T10:06:36.000Z

Once the Response data is available, it should send it as a message to the supervisor.

I think sending back the response, even encoded as Vec<u8> is quite costly in terms of performance, right? Unless we add something like Erlang's immutable binary data to just pass around pointers of large data. I think it's better if the handler writes to the response.

Answer 2 · 2022-06-20T10:35:01.000Z

I think sending back the response, even encoded as Vec<u8> is quite costly in terms of performance, right? Unless we add something like Erlang's immutable binary data to just pass around pointers of large data. I think it's better if the handler writes to the response.

I was thinking the same, but it would not work in practice to have two writers. For example, if the supervisor has a timeout of 60 sec. It will send itself a message using send_after that is delayed for 60 seconds. The sub-process might already start writing something to the TcpStream and suddenly the supervisor is also writing a 408 Request Timed Out mixed together to the same stream. The response needs to stay atomic and only answered by one process.

As you mentioned, in the future we will need a way to share bigger buffers with low overhead in lunatic. So, it's a problem that we will need to solve once this becomes a bottleneck, but I think we need to solve it in a more general way and not just as part of this web framework. I have some ideas on how to do this already that could also be combined with vectored i/o and avoiding serialisation and it would be completely safe to do so in scenarios like this, where one process is "ending" and giving up control over its linear memory to another.

Philipax was doing something similar in the past and turns out for a 4MB response it only takes 4ms.

Philpax — 05/23/2022
Switched over to serde_bytes and it's much better now, thank you!
4 milliseconds, 4743875 bytes
might be nice to have that in a FAQ somewhere

I would just go with a simple solution for now, before we start optimising for message size.

Answer 3 · 2022-06-20T10:47:20.000Z

Yeah, having just a single writer is of course better. But I was thinking maybe the supervisor could receive a start_writing request and "allow" the handler to write. That being said, if it's only 4ms for a 4MB response we can just send a message to the supervisor and that should not be a problem for a long time. If/when we later optimise sharing large data in the vm this problem will cease to exist.

We should however look into using serde_bytes for encoding the data for this