faye/faye-websocket-node

Connections lost in API.CLOSING state causing EMFILE error

Closed this issue · 12 comments

Faye-websocket-node seems to be leaking socket descriptors when closing 'hixie' websockets.

The culprint seems to be this hixie:close() function:
https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/hybi_parser.js#L255-L260

Which, as opposed to draft-76:close() function:
https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/draft76_parser.js#L91-L96

Doesn't seem to call callback immediately. This may result in callback not being called at all, and leaking a descriptor as a result.

The side effect of that is the strange lsof message can't identify protocol, described here:
https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/

More discussion:
sockjs/sockjs-node#99 (comment)

I implemented a monkey patch that was suggested in that sockjs thread and it appears the leaks have stopped. I'm not sure whether I changed the semantics of how things work (I certainly did with the stream.destroy(), but it doesn't appear to be affecting functionality at all).

I just required this file before requiring faye (I know this is not strictly a faye thread, but the same could be done if you're just using faye-websocket before you instantiate it.

    var WebSocket = require('faye/node_modules/faye-websocket');
var Event = require('faye/node_modules/faye-websocket/lib/faye/websocket/api/event');

var API = {
    CONNECTING:   0,
     OPEN:         1,
     CLOSING:      2,
     CLOSED:       3
};


WebSocket.prototype.close = function close(code, reason) {
    if (this.readyState === API.CLOSING ||
        this.readyState === API.CLOSED) return;

    this.readyState = API.CLOSING;

    var close = function() {
      this.readyState = API.CLOSED;
      if (this._pingLoop) clearInterval(this._pingLoop);
      if (this._stream) this._stream.destroy(); // CHANGED from .end() to .destroy()
      var event = new Event('close', {code: code || 1000, reason: reason || ''});
      event.initEvent('close', false, false);
      this.dispatchEvent(event);
    };

    // CHANGED -- removed ack check
    if (this._parser.close) this._parser.close(code, reason);
    close.call(this);
};

console.log('monkey patched WebSocket.prototype.close');

I've read all the supporting threads on this and I get the idea about the closing callback but I'm still a little confused. If the server sends a close frame and the client never acknowledges it, and keeps the TCP socket open, I can see why this would happen. But wouldn't the client closing the TCP connection trigger these event handlers:

https://github.com/faye/faye-websocket-node/blob/0.4.3/lib/faye/websocket.js#L77-L79

If the client closes the TCP connection, do I need to do anything else on the server side to clean up file descriptors?

@pselden4 I think destroy() might not make sense here since in some cases where the "inner close()" should be called, more data needs to be sent.

Specifically, if the client sends a closing frame then the server needs to reply before TCP is closed. If you use destroy(), the server's closing frame may not get sent.

If you remove the ack-waiting code, but keep end() instead of destroy(), what happens? What about with destroySoon()?

I think there might a problem in the following scenario (2 and 3 seem mutually exclusive, but it's the only way that I could see that this could happen).

  1. Server sends close frame, setting the client to CLOSING and waiting for ack.
  2. Client never receives FIN frame for whatever reason (network, firewall, etc...), and thus never sends ack back.
  3. Client later closes socket, but these lines https://github.com/faye/faye-websocket-node/blob/0.4.3/lib/faye/websocket/api.js#L58-L59 make it return immediately.

Result: the inner close function is never called.

I agree with .destroy not being correct here -- it will rely on the client handling this abrupt closure properly, but in my scenario it is the lesser of the 2 evils (server runs out of outbound IPs and throws EMFILE errors). I'll try out .destroySoon to see if that handles things a little more gracefully.

This is a plausible explanation. What do you think of this commit? b86f505

Basically I've stopping readyState=CLOSING from halting the closing process if no ask is required, as happens for all error cases -- TCP termination, protocol errors, etc. readyState=CLOSED still halts the method to prevent multiple close events.

I also renamed the 'inner close' function and refactored a little.

Let me know if this fixes the EMFILE problem.

Woops, use this commit actually -- I missed a rename. 87f1aba

In my opinion, in the current state of things in node, destroy() should be called (or maybe destroySoon(), not sure what node's internal code flow on this one, need to check), otherwise, remote user can just exhaust server's file descriptors.

attacker triggering the server initiate a close of a websocket, (for example not authing to the app),
the server then initiates a close, which calling the underling shutdown() syscall,
then just not ACKing the FIN packet, and the socket not properly cleaned up by node we get this:
sockjs/sockjs-node#99 (comment) (that stats is after setting the ack=false)
in my case, on FBSD, lots of sockets in FIN_WAIT_2 state that constantly grows.

@yurynix destroy() is not suitable here since in the happy path we need to send some closing data to the other peer while closing. destroySoon() might work, if it waits for buffered data to be transfered.

@majek @pselden4 Any updates from you guys on this?

I added this to prod today and will monitor (just your commit, no destroySoon).

This has definitely fixed the memory leak for me. I'm not sure about the EMFILE issue, as my load balancer closes idle sockets after 5 minutes, but I don't see any in FIN_WAIT_2 state right now.

@jcoglan I can't really comment on the fix - in SockJS I started using "ack=false", SockJS doesn't really need to pass the error reason on the ws layer.

I'm marking this as fixed. Let's re-assess how sockets are doing once I've got a new set of packages out. I think various open bugs at the moment may be related to this.