http-kit/http-kit

Deadlock when using websockets inside a handler

Opened this issue · 8 comments

Hi! I stumbled over a nasty deadlock which I could have probably avoided if I had some deeper understanding about how websocket messages are managed by http-kit.

I use http-kit as a backend to connect users to some devices that can only communicate via websockets: a user makes an HTTP request to the backend, the backend talks to a device through an already-opened ws channel, and finally the backend sends the user a response depending on the response from the device.

What happened is that, when many http requests reached the backend simultaneously, the thread pool would be saturated by them, and the tasks managing the ws messages, which were waiting for some thread to become free, would never run.

I learned that the thread pool for http requests is also used for managing websocket channels so it is not a good idea to have an http response depend on some communication over a websocket channel managed by the same server. I think this fact should be mentioned in the docs.

I now use two separate instances of http-kit to manage http requests and websocket channels, and I'm fine.

Hi there, may I request a reproducible example for further investigation? Thanks!

I'm afraid the example will be quite involved; however, here it is:

https://github.com/stepugnetti/deadlock-issue.

Thank you! Have my hands quite full atm, but will try make this a priority when I do my next work batch on http-kit. Much appreciated! :-)

Don't worry, I've already found an acceptable workaround for my case. I just think this fact should be documented somewhere. A Github issue is already something useful, in my opinion. Thanks!

Update: unfortunately still haven't had an opportunity to investigate this, and likely won't for at least the next few weeks. Would welcome input if anyone has the interest+opportunity to take a look. Thanks!

@stepugnetti We encountered the same issue. What is the workaround you found?

@stepugnetti @ptaoussanis

Any idea if this was introduced in latest version? We haven't encountered this running 2.1.19 but today, when we switch to 2.2.0 we did

We upgraded to 2.2.0 to overcome this #165 but now we have this issue...

@chenfisher Thanks for the report Chen.

Any chance I may be able to impose on you to run git bisect against the commits for v2.2.0? If this is a regression, would be handy to know where the problem was unintentionally introduced.