Question: what's the correct way to gracefully shutdown a cluster and its children?
d6u opened this issue · 3 comments
I have a shutdown handler in child.js
let makeCloseHandler = (sig) => {
return () => {
console.log(`Received signal ${sig}`);
server.close(() => {
console.log('Server closed');
});
};
};
process.on('SIGINT', makeCloseHandler('SIGINT'));
process.on('SIGTERM', makeCloseHandler('SIGTERM'));
And cluster.js
const recluster = require('recluster');
const path = require('path');
const cluster = recluster(path.join(__dirname, 'index.js'), {
timeout: 120
});
const workerEvent = function(ev) {
cluster.on(ev, function(worker) {
console.log('Worker ' + worker.id + ' [' + worker.process.pid + '] ' + ' ' + ev + '.');
});
};
['online', 'listening', 'disconnect', 'exit'].forEach(function(ev) {
workerEvent(ev);
});
cluster.run();
console.log('Master ' + process.pid + ' started.');
let makeCloseHandler = (sig) => {
return () => {
console.log(`Cluster received signal ${sig}`);
cluster.terminate(() => {
console.log('Cluster closed');
});
};
};
process.on('SIGINT', makeCloseHandler('SIGINT'));
process.on('SIGTERM', makeCloseHandler('SIGTERM'));
But I see those logs when I stop the node cluster.js
^CReceived signal SIGINT
Received signal SIGINT
Cluster received signal SIGINT
Received signal SIGINT
Server closed
Server closed
Server closed
Received signal SIGINT
Server closed
Cluster closed
It seems like child are receiving SIGINT
before master does. So I'm confused on how grace shutdown are handled here. What's the best way to ensure we don't drop connect halfway in a request?
To avoid dropping connections halfway,
(1) If you have another load balancer above the cluster, the best way would be to switch to the replacement process before shutting down the cluster using that load balancer.
(2) If you just want to replace the recluster workers gracefully (without replacing the master process) you should probably use reload
instead of terminate
.
edit: I forgot that terminate
kills the workers immediately. Right now there is no shutdown
method which would keep active workers running while there are connections (at least until the timeout expires), so we might need to add that. Until thats added I guess the best workaround for (1) would be to wait sufficiently long after switching, then use terminate.
Thanks for explaining!
@spion From your last comment, I take it you are open to a PR to add a shutdown
method?