doxout/recluster

old workers never die.

qzaidi opened this issue · 3 comments

Hi,

When I signal the master using SIGUSR2, new workers are started but the old ones never die.

            var timeout = setTimeout(worker.kill.bind(worker), opt.timeout * 1000);
            worker.on('disconnect', clearTimeout.bind(this, timeout));
            // possible leftover worker that has no channel estabilished will throw
            try { worker.disconnect(); } catch (e) { }
            cluster.removeListener('listening', stopOld);

What I think is happening is that the worker.disconnect() call is generating a disconnect event that clears the timeout before it has a chance to execute. Am I reading this correct?

I am running node 0.10.0, so it could be that this event wasn't generated in your version of node?

From what I've read in node's api documentation, the disconnect event may fire without the worker exiting if the worker is busy. But otherwise, worker.disconnect() only stops sending requests to the worker. The event disconnect is not emitted until all requests are served, right before the worker exits.

worker.disconnect()

When calling this function the worker will no longer accept new connections, but they will be handled by any other listening worker. Existing connection will be allowed to exit as usual. When no more connections exist, the IPC channel to the worker will close allowing it to die graceful. When the IPC channel is closed the disconnect event will emit, this is then followed by the exit event, there is emitted when the worker finally die.

I tried using the exit event instead. However, since I can't reproduce the issue in the test suite (the cluster module reports that the workers are always successfully killed), can you verify if this fix works for you?

This works - the workers are now cleaned up.

Great, closing the issue. The fixed has been published on npm (0.1.5)