Error handling
mrjoes opened this issue · 1 comments
mrjoes commented
I was doing some internal tests and here's what I found.
- There's
sockjs_session_sup
supervisor which is configured to have up to 10 failures in 10 seconds. - sockjs_session calls callback functions in context of its own process
- If there's error in the one of the callback methods, sockjs_session process will die and supervisor will restart it.
- If socks_session dies too often (basically 10 times in 10 seconds), the supervisor will be killed with all its children. Effectively, if you can find a way to send a message which triggers the exception, you can kill SockJS server.
I think client processes should be isolated. Even if one client misbehaves, it should not kill whole process tree.
Not sure how to fix it properly, but few ideas:
- Increasing limits won't really help - it is temporary measure
- Maybe switch to
temporary
restart strategy? The session will be closed anyway -terminate()
is getting called and the session is removed from ETS. Plus, the state is lost, so it can't recover. Or am I missing something?
SimonWoolf commented
In case it's relevant to anyone else: I've made @mrjoes's suggested change (transient -> temporary) in our fork of sockjs-erlang at https://github.com/ably-forks/sockjs-erlang , after running into the same issue. (Sockjs sessions crashing for legitimate reasons (a service dependency was down) was causing the sockjs application to exit, which doesn't get restarted (even when start_permanent
is set), which ensured it continued to fail even once the reason for the crashes fixed itself).