gardener/machine-controller-manager

Cleanup after IT is not working properly

Opened this issue · 0 comments

What happened

/area testing
/kind bug
/priority 2

When we run the IT for mcm-providers in the pipeline, the machine resources are not cleaned up if the MC process fails to run. The logs show that the resources are cleaned up :-

STEP: Running Cleanup
STEP: Restarting Machine Controller 
2023/02/20 13:50:56 deleting test-machine-deployment
2023/02/20 14:05:56 machines.machine.sapcloud.io "test-machine" not found
2023/02/20 14:05:56 deleting test-mc-v1 machineclass
2023/02/20 14:20:56 deleting test-mc-v2 machineclass
2023/02/20 14:20:56 machineclass deleted
2023/02/20 14:20:56 Deleting crds
2023/02/20 14:20:56 ../../../dev/mcm/kubernetes/crds is a directory.
2023/02/20 14:20:56 resource ../../../dev/mcm/kubernetes/crds/machine.sapcloud.io_machineclasses.yaml has been successfully removed from the cluster
2023/02/20 14:20:56 resource ../../../dev/mcm/kubernetes/crds/machine.sapcloud.io_machinedeployments.yaml has been successfully removed from the cluster
2023/02/20 14:20:56 resource ../../../dev/mcm/kubernetes/crds/machine.sapcloud.io_machines.yaml has been successfully removed from the cluster
2023/02/20 14:20:56 resource ../../../dev/mcm/kubernetes/crds/machine.sapcloud.io_machinesets.yaml has been successfully removed from the cluster
2023/02/20 14:20:56 resource ../../../dev/mcm/kubernetes/crds/machine.sapcloud.io_scales.yaml has been successfully removed from the cluster

There is deletion timestamp on the machine resources, but we are not waiting for them to be deleted.

k get mc                                                           
NAME                                  STATUS        AGE    NODE
test-machine-deployment-76d4d-787g2   Terminating   2d1h   izgw86khdplvsoa10rik6qz
test-machine-deployment-76d4d-jcc6k   Terminating   2d1h   izgw88ckciy38niobf9u6yz
test-machine-deployment-76d4d-mzbcz   Terminating   2d1h   izgw88ckciy38niobf9u6zz

The above issue was encountered in the mcm-provider-alicloud pipeline, where the MC process stopped between the tests. The error code returned by the process was 2 . The error comes at

gomega.Expect(mcsession.ExitCode()).Should(gomega.Equal(-1))

I am not sure why the process stopped in between the tests. There were no logs captured for MC.

What you expected to happen:
The machine resources should be cleaned up.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others: