Rolling restart optimisation
simplesteph opened this issue · 9 comments
Right now the rolling restart script goes from broker 1, to 2, to 3, etc...
It should isolate the active controller as the last broker to restart
In a worst case scenario, the 1 is active controller, then 2 becomes active controller, then 3, etc...
In which case, we always reboot the active controller, which triggers leader election at every single node reboot.
Thoughts? I might try to put something together someday, or you can jump on it :)
This sounds like a good idea. Go for it!
One aspect to keep into consideration is that the order in which the brokers are restarted should be as much as possible stable, because if something goes wrong during the rolling restart, we can use the --skip
parameter to continue from where we left. This change creates the potential for a non-stable ordering of brokers when the controller changes.
I suggest making this feature optional.
As a side note: it would be nice to implement this change together with an option that will automatically skip brokers restarted withing the last X minutes or hours. In this way we could automatically recover the execution of a rolling restart, and not have to rely on the order of nodes. Thoughts? We could use the jvm uptime provided via jmx.
@fede1024 overall the desire is not to reboot what has been rebooted. Instead of a simplistic skip, we could keep state of what has been rebooted , and then have a reset option to reset the reboot state. This would work hand in hand with that strategy and completely remove the need of skip
Where would this state be stored? I think the uptime is the most natural way to get that state for free. The tool could have an option like "reboot all the machines that haven't been rebooted since midday" or "reboot all the instances whose process is older than the configuration file", and just have use the uptime to determine what still needs to be rebooted.
local file (state.temp) or something. I like the "reboot since time X", but it requires to figure out when the script was stopped, if something has to be fixed for a few hours, etc. Local state file that can be wiped is simple-ish
Side note: The nicest would have a Kafka shutdown API with an expected session ID or something (does nothing if Kafka has been rebooted using that session ID), but that's probably more long term and would come with a KIP
Local state only works when a single developer is in charge of a single cluster. When you have many developers and clusters is very easy to loose track of them. The "reboot since time X" option doesn't depend on the time when the script failed, but the time when new Kafka configuration was pushed. If new configuration is pushed at 1PM, you can run a "restart every process that was started before 1PM".
This run will be idempotent, and can be executed as many time as we want, even from different developers and machines.
@fede1024 sorry never considered a multi tenant situation, having multiple people orchestrating a single cluster reboot. I'm okay with that approach
Hi @simplesteph ,
It's a very good and needed idea,
we did it with a Bash script that gets the controller from the ZK and then restarts all broker except the controller and in the end the controller.
Do you have an estimation when you'll implement it? we would love to use it in your script.
Thanks
I have no plans on submitting a pr soon but feel free to do one , especially if you have done something similar in bash