faye/permessage-deflate-node

Uncaught "zlib binding closed" errors

Closed this issue · 18 comments

We've started doing some testing with the permessage-deflate extension as we're very excited about bringing it into our project (we have some very large initial-state JSON messages being transmitted over Faye).

So far, we've seen a few unhandled zlib binding closed exceptions. I've only seen this on internet-facing beta servers, not while testing locally. I'm guessing it's down to WebSocket connections being closed 'badly' - ie, the TCP socket abruptly closing when a client's internet connection is terminated (something we have been testing)

zlib.js ? in InflateRaw.Zlib._transform at line 399
_stream_transform.js ? in InflateRaw.Transform._read at line 179
_stream_transform.js ? in InflateRaw.Transform._write at line 167
_stream_writable.js ? in doWrite at line 225
_stream_writable.js ? in clearBuffer at line 304
_stream_writable.js ? in onwrite at line 262
_stream_writable.js ? in WritableState.onwrite at line 96
_stream_transform.js ? in afterTransform at line 99
_stream_transform.js ? in TransformState.afterTransform at line 74
zlib.js ? in Zlib.callback at line 455

Do you know how to patch this? I am listening to the error event whenever I'm using a zlib stream so I'm not sure what's happening to trigger errors outside of that. I would only expect them to error if given invalid input.

From the logs (I'll probably need to add more logging though), I saw the following sequence of events:

2014-12-18T13:11:06.238Z - faye-engine: Client XXX: connection:close
2014-12-18T13:11:07.241Z - faye-engine: Client XXX: connection:open
2014-12-18T13:11:11.037Z - error: Uncaught exception: zlib binding closed message
    at InflateRaw.Zlib._transform (zlib.js:399:15)
    at InflateRaw.Transform._read (_stream_transform.js:179:10)
    at InflateRaw.Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:225:10)
    at clearBuffer (_stream_writable.js:304:5)
    at onwrite (_stream_writable.js:262:7)
    at WritableState.onwrite (_stream_writable.js:96:5)
    at afterTransform (_stream_transform.js:99:5)
    at TransformState.afterTransform (_stream_transform.js:74:12)
    at Zlib.callback (zlib.js:455:5)
2014-12-18T13:11:11.038Z - error: "zlib binding closed" forcing shutdown
2014-12-18T13:11:11.047Z - error: Uncaught exception: zlib binding closed message (...)
2014-12-18T13:11:11.048Z - error: Uncaught exception: zlib binding closed message (...)
2014-12-18T13:11:15.943Z - error: Uncaught exception: zlib binding closed message (...)
2014-12-18T13:11:33.051Z - info: Server restarted

So, we had one error, which invoked a server shutdown process, quickly followed by the same error three more times during the shutdown. At the moment, I'm not sure whether these errors were for other sockets or the same socket again and again. Our shutdown process attempts to cleanly close resources, including the HTTP Server before doing a process.exit(1) -- I'm pretty sure that additional binding closed errors must have something to do with the HTTP server closing the connection, which might be a clue to what the problem is.

I'll investigate further.

I'm not so sure about that; the permessage-deflate streams are not connected to the TCP socket at all. They are handed buffers of data with no idea where those buffers came from.

I'm thinking of looking at the possibility of a race condition between the close being called on the inflate or deflate stream and new data arriving? Do you think that's a reasonable avenue of investigation?

I've really struggled to recreate this locally. The server its running on is pretty overloaded (this is intentional, having our beta servers running on underpowered hardware brings out some bugs that we would only otherwise find when our production servers are under strain).

I get the same zlib error if I close the an inflateRaw before writing to it:

var zlib = require('zlib');
var inflate = zlib.createInflateRaw();
inflate.close();
inflate.write(deflatedData);

However, I have no idea why the error handler is not trapping the error.

Would it be worthwhile checking this._lockIn and this._lockOut in Session.prototype.close and setting a _closeAfterLockIn and _closeAfterLockOut flag?

Something like this:

Session.prototype.close = function() {
  if (this._lockIn) {
    /* Defer the close until all processIncomingMessages calls complete */
    this._closeAfterLockIn = true;
  } else if (this._inflate && this._inflate.close) {
    this._inflate.close();
  }
  this._inflate = null;

  /* Similar code for deflate goes here.... */
  if (this._lockOut) {
  ...
  }
};

and then in the processIncomingMessage's _return:

  var return_ = function(error, message) {
    return_ = function() {};

    inflate.removeListener('data', onData);
    inflate.removeListener('error', onError);
    if (!self._inflate && inflate.close) inflate.close();

    self._lockIn = false;
    var next = self._queueIn.shift();
    if (next) {
      self.processIncomingMessage.apply(self, next);
    } else if(this._closeAfterLockIn) {
      delete this._closeAfterLockIn;
      inflate.close();
    }
    callback(error, message);
  };

I guess a possible workaround may be to disable peer context takeover, but I'm reluctant to do this.

Just hit this too:

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: zlib binding closed
    at InflateRaw.Zlib._transform (zlib.js:399:15)
    at InflateRaw.Transform._read (_stream_transform.js:179:10)
    at InflateRaw.Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:225:10)
    at clearBuffer (_stream_writable.js:304:5)
    at onwrite (_stream_writable.js:262:7)
    at WritableState.onwrite (_stream_writable.js:96:5)
    at afterTransform (_stream_transform.js:99:5)
    at TransformState.afterTransform (_stream_transform.js:74:12)
    at Zlib.callback (zlib.js:455:5)

Will try to do a little debugging/minimizing.

(Note: we are using Node 0.10.33; 0.10.34 upgraded zlib, though we're avoiding using it until nodejs/node-v0.x-archive#8948 is fixed.)

Confirmed issue also exists in 0.10.35 (with upgraded zlib).

@glasser, out of interest, is your websocket connecting directly to your node process, or going via a reverse proxy (nginx, haproxy, elb, etc)?

I believe it's going through a proxy that is itself in-process (this is consistently caught by my test suite, though it's a complex test suite).

The issue is that one can call Session.close between inflate.flush and receiving the callback from inflate.flush. There needs to be some delay on the inflate.close in Session.close in this case.

#3 appears to fix this for me, though it has no tests (and it made me notice some other API issues).

I've commited faye/websocket-extensions-node@1b77290 and faye/websocket-driver-node@cc5e5c2 to hopefully address this issue.

Thanks, this looks right.

@suprememoocow I'm closing this since I'm fairly sure the aforelinked commits fix this problem, but please reopen if it's not fixed for you.

bs85 commented
Error: zlib binding closed
    at InflateRaw.Zlib._transform (zlib.js:399:15)
    at InflateRaw.Transform._read (_stream_transform.js:179:10)
    at InflateRaw.Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:225:10)
    at clearBuffer (_stream_writable.js:304:5)
    at onwrite (_stream_writable.js:262:7)
    at WritableState.onwrite (_stream_writable.js:96:5)
    at afterTransform (_stream_transform.js:99:5)
    at TransformState.afterTransform (_stream_transform.js:74:12)
    at Zlib.callback (zlib.js:455:5)

Node 10.33, permessage-deflate 0.1.3

It happened at random times, mostly when connected from a bad network (3G). I can now reproduce it reliably by having the client repeatedly attempt to connect, and the server simply close the connection immediately. After 20-30 tries on average I get the error above.

I'll try to look into it later, it seems like Zlib.callback is called after the connection / whatever isn't valid anymore, not sure though...

@bs85 which versions of websocket-extensions and websocket-driver are you running? Do you have a script that reproduces the issue?

bs85 commented

@jcoglan sorry I've had a long week, didn't realize the fix was in those libs and not in permessage-deflate itself.

I've updated sockjs which bumped the 2 libs 0.1.0 -> 0.1.1 and 0.5.1 -> 0.5.4 and it seems the problem is gone now!

Thanks a lot, have a nice week-end!

@bs85 no worries, good to know :)