sockjs/sockjs-erlang

Error handling

mrjoes opened this issue · 1 comments

I was doing some internal tests and here's what I found.

  1. There's sockjs_session_sup supervisor which is configured to have up to 10 failures in 10 seconds.
  2. sockjs_session calls callback functions in context of its own process
  3. If there's error in the one of the callback methods, sockjs_session process will die and supervisor will restart it.
  4. If socks_session dies too often (basically 10 times in 10 seconds), the supervisor will be killed with all its children. Effectively, if you can find a way to send a message which triggers the exception, you can kill SockJS server.

I think client processes should be isolated. Even if one client misbehaves, it should not kill whole process tree.

Not sure how to fix it properly, but few ideas:

  1. Increasing limits won't really help - it is temporary measure
  2. Maybe switch to temporary restart strategy? The session will be closed anyway - terminate() is getting called and the session is removed from ETS. Plus, the state is lost, so it can't recover. Or am I missing something?

In case it's relevant to anyone else: I've made @mrjoes's suggested change (transient -> temporary) in our fork of sockjs-erlang at https://github.com/ably-forks/sockjs-erlang , after running into the same issue. (Sockjs sessions crashing for legitimate reasons (a service dependency was down) was causing the sockjs application to exit, which doesn't get restarted (even when start_permanent is set), which ensured it continued to fail even once the reason for the crashes fixed itself).