flatcar-archive/nomad-on-flatcar

Drain Nomad node before update

Opened this issue · 2 comments

Thanks @iaguis for your presentation at HashiTalks2022!

You mentioned that there is no method to drain a node before an update, similar what the Flatcar Linux Update Operator does on Kubernetes. However, Nomad comes with a CLI command for that, e.g. nomad node drain -enable -self -force -yes force-drains the current node immediately.

This command could probably be run in a systemd service on reboot.target or shutdown.target. When the node is up again, you have to run nomad node drain -disable -self to remove the "drain" flag from the node again.

Hey, thanks for attending my talk and for reaching out!

This shows that I'm a Nomad noob, thanks for sharing! I like that it's a simple solution and it should work to prevent service unavailability most of the time. However, Flatcar nodes might reboot at arbitrary times when updating so you might end up in a situation when too many nodes reboot at once and your workloads might not "fit" in the remaining nodes and this is when a centralized system helps.

I think combining this solution with the Locksmith approach with a reasonable policy should do the trick: you make sure nodes are drained + there's a limit of nodes rebooting at once. The annoying thing about Locksmith is that it assumes an etcd cluster, and I believe most Nomad users use Consul so perhaps something that integrates deeper into Nomad makes sense.

Thanks again!

There is also the systemd shutdown hook but I think it's hardcoded under /usr which makes problems on Flatcar. Would be great to show how to run the action on system shutdown/reboot. Turning all this into a section of the Flatcar docs would be good, too.