kubernetes-sigs/cluster-api-provider-aws

cloud-init boothook logic broken with cloud-init 24.2

Opened this issue · 1 comments

/kind bug

What steps did you take and what happened:

Ubuntu Noble recently upgraded cloud-init to version 24.2 (prior on 24.1.3). For some reason, this has broken systemctl restart cloud-init, as called by CAPA here. It doesn't actually restart cloud-init anymore, so /etc/secret-userdata.txt never gets picked up, so the Kubernetes components never come up.

What did you expect to happen:

Kubernetes nodes should come up with no problems.

Anything else you would like to add:

I've been digging into this for the past couple of days, here are some miscellaneous notes:

  • cloud-init clean --reboot, run post-hoc, works as expected. So there's nothing wrong with /etc/secret-userdata.txt or the cloud-config.txt itself.
  • I can't see an obvious cloud-init change that would result in this changed behavior. It might be a direct regression upstream, but I'm not familiar enough with cloud-init internals to dig further. Plus, we're not really set up to git bisect cloud-init :(
  • This is a separate issue from #4745 - we are already directly patching features.py, which has been sufficient to keep things working up through 24.1.

Environment:

  • Cluster-api-provider-aws version: 2.4.2
  • Kubernetes version: (use kubectl version): 1.27, 1.28, 1.29 (unrelated to kube version)
  • OS (e.g. from /etc/os-release): Ubuntu 24.04 Noble

/triage accepted
/priority important-soon