shotover/shotover-proxy

shotover shutsdown uncleanly if shutdown immediately after startup.

Opened this issue · 0 comments

rukai commented

I've observed that shotover will shutdown without an exit code (causing TokioBinProcess to fail the test) when we startup shotover and shut it down immediately after.
I demonstrated this was not to do with a lack of messages but instead a race condition by adding a sleep to the test and observing that the issue still occurred when we startup shotover -> sleep 10 secs -> shutdown shotover
This occurred while testing with Kafka source and KafkaSinkSingle but its probably a universal issue.

These lines detect a sigint or sigterm and begin shutting down shotover:

self.runtime.spawn(async move {
tokio::select! {
_ = interrupt.recv() => {
info!("received SIGINT");
},
_ = terminate.recv() => {
info!("received SIGTERM");
},
};
trigger_shutdown_tx.send(true).unwrap();
});

The first step to solving this is writing an integration test that reproduces the issue.

However I'm not sure this is entirely solvable.
I believe the issue is that the signal is sent before we register the signal handlers.
We could bring the registration earlier.
We could also change our signal handler to not use tokio so we can bring handler registration even earlier.

However! I think theres always some fundamental race condition going on, the signal could be sent before main even starts executing. Maybe there is some way to solve this but I dont know what it is.

Maybe the best solution is to change tokio-bin-process to mention: "Maybe the process was killed before it setup its signal handler?"