Handle nil pointer in instanceProvision failure to continue deletion
sindhusri16 opened this issue · 0 comments
What happened:
We upgraded capioci to v0.11.2, and created some nodepools on existing clusters. There were some provisionFailures we suppose, not sure but we wanted to delete the whole cluster, which was stuck in deleting phase because of these nodepools. In the backend, the instance was running when we issued the delete command. Even though it shows here as instanceProvisionFailed, in the console we were able to see those machines in 'running' state. There could have been some internal issue that caused the provision failure, but when we were trying to delete the cluster we came across this log with some nil pointer:
`{"stream":"stderr","message":"{"ts":1697616351953.0674,"caller":"controller/controller.go:329","msg":"Reconciler error","controller":"ocimachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"OCIMachine","OCIMachine":
{"name":"5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf","namespace":"oke"}
,"namespace":"oke","name":"5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf","reconcileID":"7ac776ee-e8e6-473a-b731-b6dc625d7858","err":"error deleting instance 5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf: can not marshal to path in request for field InstanceId. Due to can not marshal a nil pointer","errVerbose":"can not marshal to path in request for field InstanceId. Due to can not marshal a nil pointer
nerror deleting instance 5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf\ngithub.com/oracle/cluster-api-provider-oci/controllers.(*OCIMachineReconciler).reconcileDelete\n\t/workspace/controllers/ocimachine_controller.go:391\ngithub.com/oracle/cluster-api-provider-oci/controllers.(*OCIMachineReconciler).Reconcile\n\t/workspace/controllers/ocimachine_controller.go:152\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}","pod":"capoci-controller-manager-9659bd598-hpcp9","container":"manager","image":"253.255.0.31:5000/pca/cluster-api-oci-controller:v0.11.2"}`
What you expected to happen:
Cluster deletion should succeed
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
clusterctl describe cluster 4727b18bc0884a88bbdef686e405176d -n oke --show-conditions Machine
NAME READY SEVERITY REASON SINCE MESSAGE
!! DELETED !! Cluster/4727b18bc0884a88bbdef686e405176d True 10d
¿¿ClusterInfrastructure - OCICluster/4727b18bc0884a88bbdef686e405176d True 10d
¿¿ControlPlane - KubeadmControlPlane/4727b18bc0884a88bbdef686e405176d-control-plane True 10d
¿ ¿¿3 Machines... True 10d See 4727b18bc0884a88bbdef686e405176d-control-plane-7tn2f, 4727b18bc0884a88bbdef686e405176d-control-plane-mctrr, ...
¿¿Workers
¿¿Other
¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-d9dkb False Error InstanceProvisionFailed 4d15h
¿ ¿¿BootstrapReady True 4d15h
¿ ¿¿HealthCheckSucceeded False Warning NodeStartupTimeout 4d15h Node failed to report startup in 10m0s
¿ ¿¿InfrastructureReady False Error InstanceProvisionFailed 4d15h
¿ ¿¿NodeHealthy False Info Deleting 4d15h
¿ ¿¿OwnerRemediated False Warning WaitingForRemediation 4d15h
¿ ¿¿PreTerminateDeleteHookSucceeded True 4d15h
¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-kf9t7 False Error InstanceProvisionFailed 4d16h
¿ ¿¿BootstrapReady True 4d16h
¿ ¿¿HealthCheckSucceeded False Warning NodeStartupTimeout 4d16h Node failed to report startup in 10m0s
¿ ¿¿InfrastructureReady False Error InstanceProvisionFailed 4d16h
¿ ¿¿NodeHealthy False Info Deleting 4d16h
¿ ¿¿OwnerRemediated False Warning WaitingForRemediation 4d16h
¿ ¿¿PreTerminateDeleteHookSucceeded True 4d16h
¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-kjn2w False Error InstanceProvisionFailed 4d15h
¿ ¿¿BootstrapReady True 4d15h
¿ ¿¿HealthCheckSucceeded False Warning NodeStartupTimeout 4d15h Node failed to report startup in 10m0s
¿ ¿¿InfrastructureReady False Error InstanceProvisionFailed 4d15h
¿ ¿¿NodeHealthy False Info Deleting 4d15h
¿ ¿¿OwnerRemediated False Warning WaitingForRemediation 4d15h
¿ ¿¿PreTerminateDeleteHookSucceeded True 4d15h
¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-nfxwk False Error InstanceProvisionFailed 4d15h
¿¿BootstrapReady True 4d15h
¿¿HealthCheckSucceeded False Warning NodeStartupTimeout 4d15h Node failed to report startup in 10m0s
¿¿InfrastructureReady False Error InstanceProvisionFailed 4d15h
¿¿NodeHealthy False Info Deleting 4d15h
¿¿OwnerRemediated False Warning WaitingForRemediation 4d15h
¿¿PreTerminateDeleteHookSucceeded True 4d15h
Environment:
- CAPOCI version: v0.11.2
- Cluster-API version (use
clusterctl version
): - Kubernetes version (use
kubectl version
): - Docker version (use
docker info
): - OS (e.g. from
/etc/os-release
):