ionos-cloud/cluster-api-provider-ionoscloud

`IonosCloudMachine` object deletion gets stuck "Patching LAN"

SimonKienzler opened this issue · 1 comments

What steps did you take and what happened:
I deleted a IonosCloudMachine object. The controller picked up the deletion, but got stuck. The ICM remained, so did the corresponding IONOS VM. The corresponding Node in the cluster the ICM was created for remained Ready. The controller logs showed that the ICM controller continuously issued LAN patching requests (once a minute). Inspecting the request queue using ionosctl requests list yields:

...
<request-id> 2024-06-25T09:45:50Z   PATCH    DONE     Request has been successfully executed    <lan-id> (lan)
<request-id> 2024-06-25T09:44:52Z   PATCH    DONE     Request has been successfully executed    <lan-id> (lan)
<request-id> 2024-06-25T09:43:53Z   PATCH    DONE     Request has been successfully executed    <lan-id> (lan)
<request-id> 2024-06-25T09:42:57Z   PATCH    DONE     Request has been successfully executed    <lan-id> (lan)
...

As the requests all succeed, I assume this is not an issue with the provisioning queue.

Deleting the corresponding machine manually finally allowed the controller to progress in the deletion and completing it.

What did you expect to happen:
A single LAN patch request, followed by the regular ICM deletion flow.

Anything else you would like to add:
During debugging with @gfariasalves-ionos we attempted to get the currentRequestID from the ICM status, but it seems that even though this ID set here, the status subresource is never actually patched with the updated information. Is that an issue that should be tracked separately?

Environment:

  • Cluster-api-provider-ionoscloud version: v0.2.0
  • Kubernetes version: (use kubectl version): v1.29.2 (local kind)
  • OS (e.g. from /etc/os-release): Ubuntu 22.04

Could not reproduce, so it might have been external cirmustances causing this. Can be reopened when the issue rears its head again.