operator-framework/operator-sdk

Helm release disappears - operator unable to uninstall release - "Release not found"

pjestin-sym opened this issue · 3 comments

Bug Report

This issue happens with a Helm operator. We have been seeing it recently, which makes us think of a regression from either operator-sdk 1.33.0 or GKE 1.27.

What did you do?

  • Create a custom resource as input to the operator with ArgoCD
  • Wait for the operator to install the Helm release
  • At this point, the helm release is visible and the corresponding secret present in the namespace.
  • Delete the resource using ArgoCD

What did you expect to see?

  • The Helm release should not disappear until all resources have been removed.
  • After the CR has been removed, the operator should be able to properly remove the resources

What did you see instead? Under which circumstances?

  • The Helm release disappears (helm list and kubectl get secrets both stop showing the release)
  • The operator displays the following:
{"level":"info","ts":"2024-01-26T13:18:58Z","logger":"helm.controller","msg":"Release not found","namespace":"namespaces","name":"zoom-s001","apiVersion":"charts.symphony.com/v1alpha1","kind":"ExtendedNamespace","release":"zoom-s001"}
{"level":"info","ts":"2024-01-26T13:18:58Z","logger":"helm.controller","msg":"Removing finalizer","namespace":"namespaces","name":"zoom-s001","apiVersion":"charts.symphony.com/v1alpha1","kind":"ExtendedNamespace","release":"zoom-s001"}
  • The resources deployed by the chart are still there, even though the CR has been fully deleted
  • At this point we need to manually remove the leftovers.

Environment

Operator type:

Kubernetes cluster type:

GKE

$ operator-sdk version

1.33.0

$ go version (if language is Go)

1.21

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.10", GitCommit:"0fa26aea1d5c21516b0d96fea95a77d8d429912e", GitTreeState:"clean", BuildDate:"2024-01-17T13:46:28Z", GoVersion:"go1.20.13", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7-gke.1121000", GitCommit:"4daab1fd78c0b9aba478a19b363ab4a25bdadd79", GitTreeState:"clean", BuildDate:"2023-11-06T09:24:38Z", GoVersion:"go1.20.10 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

As a way to mitigate the impact: don't remove the finalizer in this case.

Additional context

Upgrade to GKE 1.27 (from 1.26) was done recently

I found a workaround using the uninstall-wait annotation described here: https://sdk.operatorframework.io/docs/building-operators/helm/reference/advanced_features/annotations/#helmsdkoperatorframeworkiouninstall-wait
This helps mitigate the impact, although the issue probably will remain.