siderolabs/sidero

Docs: Decommission / reset a broken bare metal machine

lieberlois opened this issue · 0 comments

The documentation for decomissioning (https://www.sidero.dev/v0.6/guides/decommissioning/) leave out an important case which is a broken talos node. E.g. when the CNI was misconfigured or so, deleting the machine on the management cluster has no effect, since the management cluster cant properly talk to the workload cluster - instead the kubectl delete machine command will run forever.

I think there should be a way, to reset the node. An idea is to delete the server CRD. Upon the next PXE boot, the server can be reset.

Is this implemented and just missing in the docs or a task for the future? Apart from reinstalling the management cluster, I haven't found anything yet. Deleting all CRDs and manually deleting the finalizers + reaccepting the machine seems to work