k3s-io/helm-controller

The helm-delete job is not cleaned when helm release is deleted by helmchart CR

GonzoHsu opened this issue · 24 comments

When I run kubectl delete helmchart <name> command, most of time I see the helm-delete- job is still existing, and it seems it is caused by duplicated helm-delete job created by helm-controller

For example, run kubectl delete helmchart nats,

The job and pod for helm delete are existing,

# kubectl get job | grep nats
helm-delete-nats                  1/1           3s         19m
# kubectl get pod | grep nats
helm-delete-nats-mn4xd                        0/1     Completed   0              19m

The event show the pod for /helm-delete-nats generated twice

# kubectl get event | grep nats
22m         Normal    SuccessfulCreate                  job/helm-delete-nats                    Created pod: helm-delete-nats-7h84g
22m         Normal    Scheduled                         pod/helm-delete-nats-7h84g              Successfully assigned default/helm-delete-nats-7h84g to 96e53ca49eb7e011eda21c000c290bc835
22m         Normal    Pulled                            pod/helm-delete-nats-7h84g              Container image "rancher/klipper-helm:v0.7.4-build20221121" already present on machine
22m         Normal    Created                           pod/helm-delete-nats-7h84g              Created container helm
22m         Normal    Started                           pod/helm-delete-nats-7h84g              Started container helm
22m         Normal    Killing                           pod/nats-0                              Stopping container nats
22m         Normal    Killing                           pod/nats-0                              Stopping container metrics
22m         Normal    Killing                           pod/nats-0                              Stopping container reloader
22m         Warning   CalculateExpectedPodCountFailed   poddisruptionbudget/nats                Failed to calculate the number of expected pods: statefulsets.apps does not implement the scale subresource
22m         Normal    Completed                         job/helm-delete-nats                    Job completed
22m         Normal    RemoveJob                         helmchart/nats                          Uninstalled HelmChart using Job default/helm-delete-nats, removing resources
21m         Normal    SuccessfulCreate                  job/helm-delete-nats                    Created pod: helm-delete-nats-mn4xd
21m         Normal    Scheduled                         pod/helm-delete-nats-mn4xd              Successfully assigned default/helm-delete-nats-mn4xd to 96e53ca49eb7e011eda21c000c290bc835
21m         Normal    Pulled                            pod/helm-delete-nats-mn4xd              Container image "rancher/klipper-helm:v0.7.4-build20221121" already present on machine
21m         Normal    Created                           pod/helm-delete-nats-mn4xd              Created container helm
21m         Normal    Started                           pod/helm-delete-nats-mn4xd              Started container helm
21m         Normal    Completed                         job/helm-delete-nats                    Job completed

And the existing pod logs are as following, I think it means the job does not find the helm release to delete.

# kubectl logs helm-delete-nats-mn4xd
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
	echo "KUBERNETES_SERVICE_HOST is using IPv6"
	CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
	CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ tiller --listen=127.0.0.1:44134 --storage=secret
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
[main] 2023/03/06 09:51:27 Starting Tiller v2.17.0 (tls=false)
[main] 2023/03/06 09:51:27 GRPC listening on 127.0.0.1:44134
[main] 2023/03/06 09:51:27 Probes listening on :44135
[main] 2023/03/06 09:51:27 Storage driver is Secret
[main] 2023/03/06 09:51:27 Max history per release is 0
Creating /home/klipper-helm/.helm 
Creating /home/klipper-helm/.helm/repository 
Creating /home/klipper-helm/.helm/repository/cache 
Creating /home/klipper-helm/.helm/repository/local 
Creating /home/klipper-helm/.helm/plugins 
Creating /home/klipper-helm/.helm/starters 
Creating /home/klipper-helm/.helm/cache/archive 
Creating /home/klipper-helm/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://charts.helm.sh/stable/ 
Adding local repo with URL: http://127.0.0.1:8879/charts 
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ timeout -s KILL 30 helm_v2 ls --all '^nats$' --output json
++ jq -r '.Releases | length'
[storage] 2023/03/06 09:51:27 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/nats.tgz.base64
+ CHART_PATH=/tmp/nats.tgz
+ [[ ! -f /chart/nats.tgz.base64 ]]
+ base64 -d /chart/nats.tgz.base64
+ CHART=/tmp/nats.tgz
+ set +e
+ [[ delete != \d\e\l\e\t\e ]]
+ helm_update delete
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^nats$' --namespace default --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
+ LINE=null,null
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ delete = \d\e\l\e\t\e ]]
+ [[ -z null ]]
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
+ echo 'Uninstalling helm_v3 chart'
+ helm_v3 uninstall nats --namespace default
Error: uninstall: Release not loaded: nats: release: not found
+ true
+ exit

Yep, we are also experiencing the same behavior.

I'm unable to replicate this. What version of k3s or helm-controller are you on? The code here should delete any objects owned by the HelmChart:

// uninstall job has successfully finished!
c.recorder.Eventf(chart, corev1.EventTypeNormal, "RemoveJob", "Uninstalled HelmChart using Job %s/%s, removing resources", job.Namespace, job.Name)
// note: an empty apply removes all resources owned by this chart
err = generic.ConfigureApplyForObject(c.apply, chart, &generic.GeneratingHandlerOptions{
AllowClusterScoped: true,
}).
WithOwner(chart).
WithSetID("helm-chart-registration").
ApplyObjects()
if err != nil {
return nil, fmt.Errorf("unable to remove resources tied to HelmChart %s/%s: %s", chart.Namespace, chart.Name, err)
}

The k3s version I used is:
k3s version v1.23.16+k3s1 (64b0feeb)

This condition will not happen every time, if you tried replicate it in a short time slot, you may not able to find it.

I also check the code and I think the following part cause that, but I am not sure what timing can cause that.

err = generic.ConfigureApplyForObject(c.apply, chart, &generic.GeneratingHandlerOptions{
AllowClusterScoped: true,
}).
WithOwner(chart).
WithSetID("helm-chart-registration").
ApplyObjects(append(objs, expectedJob)...)

When the condition happened, not only the delete job, but also some configmaps of helmchart is created again.

This condition will not happen every time, if you tried replicate it in a short time slot, you may not able to find it.

I've run the end to end tests quite a few times and not been able to reproduce it; do you have any circumstances or specific steps that seem to contribute to it? Deleting the HelmChart too quickly after creating it, deleting the namespace before the HelmChart, or so on?

My circumstance is deploying a Rocky Linux VM on VMware ESXi server with 2 vCPU and 8G RAM, then install k3s on the VM.

I didn't do specific steps to create/delete the HelmChart, just use command "kubectl apply/delete -f <HelmChart.yaml>"

How long did you wait between applying and deleting it? Did the install succeed? Was it still mid-install when deleted?

The install is succeed, I can see my pod runs without problem, not in mid-install state.
The time between applying and deleting can be few minutes to few days, I didn't find relation about it.

I suspect we encounter similar issue in Harvester.

When create/delete the same HelmChart (only deployment, not CRD) frequently, there are bugs:

The previous left delete job, will be picked by the next time re-installed helmchart, when which is deleted again.

harv41:/home/rancher # kk get jobs -A  && kk get jobs -n cattle-monitoring-system helm-delete-rancher-monitoring -oyaml
NAMESPACE                  NAME                               COMPLETIONS   DURATION   AGE
cattle-monitoring-system   helm-delete-rancher-monitoring     1/1           4s         3m41s
kube-system                helm-install-rke2-canal            1/1           12s        7d2h

status:
  completionTime: "2023-06-29T10:31:52Z"
  conditions:
  - lastProbeTime: "2023-06-29T10:31:52Z"
    lastTransitionTime: "2023-06-29T10:31:52Z"
    status: "True"
    type: Complete
  ready: 0
  startTime: "2023-06-29T10:31:48Z"
  succeeded: 1
  uncountedTerminatedPods: {}

This bug, causes the HelmChart related downstream deployments are still left there.

Harvester addon is on top of HelmChart, and we are adding workaround: each time when deploy or delete the HelmChart, delete the potential previous job first.

harvester/harvester#4108 (comment)

@brandond
The JobName of a chart is fixed, not generated, if previous job is not out-dated, it maybe re-picked next time, and assume the job is done quickly.

From my test in Harvester, it is highly possible due to this, both install and delete.

harv41:/home/rancher # kk get job -A
NAMESPACE                  NAME                               COMPLETIONS   DURATION   AGE
cattle-monitoring-system   helm-install-rancher-monitoring    1/1           52s        3m13s

image

One HelmChart delete seems triggers 2 times of delete job, the HelmChart is gone, but one job is left, and it will affect next round delete.

Did following test, log events and jobs:

First round:

1.1 create HelmChart
cattle-monitoring-system   13m         Normal    SuccessfulCreate         job/helm-install-rancher-monitoring                           Created pod: helm-install-rancher-monitoring-wnrm6
cattle-monitoring-system   13m         Normal    Completed                job/helm-install-rancher-monitoring                           Job completed

1.2 delete Helmchart
cattle-monitoring-system   12m         Normal    SuccessfulCreate         job/helm-delete-rancher-monitoring                            Created pod: helm-delete-rancher-monitoring-d2n62
cattle-monitoring-system   12m         Normal    Completed                job/helm-delete-rancher-monitoring                            Job completed
cattle-monitoring-system   12m         Normal    SuccessfulCreate         job/helm-delete-rancher-monitoring                            Created pod: helm-delete-rancher-monitoring-7lkj9
cattle-monitoring-system   12m         Normal    Completed                job/helm-delete-rancher-monitoring                            Job completed

1.3 2 jobs are left:
NAMESPACE                  NAME                               COMPLETIONS   DURATION   AGE
cattle-monitoring-system   helm-delete-rancher-monitoring     1/1           4s         9m48s
cattle-monitoring-system   helm-install-rancher-monitoring    1/1           39s        47s

Second round:

2.1 create HelmChart
cattle-monitoring-system   3m10s       Normal    SuccessfulCreate         job/helm-install-rancher-monitoring                           Created pod: helm-install-rancher-monitoring-b2kcz
cattle-monitoring-system   2m31s       Normal    Completed                job/helm-install-rancher-monitoring                           Job completed

2.2 delete Helmchart
cattle-monitoring-system   117s        Normal    SuccessfulCreate         job/helm-delete-rancher-monitoring                            Created pod: helm-delete-rancher-monitoring-5dvlw
cattle-monitoring-system   107s        Normal    Completed                job/helm-delete-rancher-monitoring                            Job completed
cattle-monitoring-system   105s        Normal    SuccessfulCreate         job/helm-delete-rancher-monitoring                            Created pod: helm-delete-rancher-monitoring-8dfh9
cattle-monitoring-system   101s        Normal    Completed                job/helm-delete-rancher-monitoring                            Job completed



2.3 1 job is left 
NAMESPACE                  NAME                               COMPLETIONS   DURATION   AGE
cattle-monitoring-system   helm-delete-rancher-monitoring     1/1           4s         2m57s

With a workaround in Harvester to forcely delete the job before trigger HelmChart action, we can avoid such issue now.
harvester/harvester#4127

time="2023-06-29T14:56:52Z" level=info msg="OnChange: user disable addon, move from AddonDeploySuccessful to new disable status AddonDisabling"



///// before delete HelmChart, delete those 2 left jobs

time="2023-06-29T14:56:52Z" level=info msg="previous job cattle-monitoring-system/helm-delete-rancher-monitoring is to be deleted, wait"
W0629 14:56:52.802777       7 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
time="2023-06-29T14:56:52Z" level=info msg="previous job cattle-monitoring-system/helm-install-rancher-monitoring is to be deleted, wait"
W0629 14:56:52.813165       7 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning


time="2023-06-29T14:56:57Z" level=info msg="delete the helm chart cattle-monitoring-system/rancher-monitoring"

...
time="2023-06-29T14:57:10Z" level=info msg="addon rancher-monitoring: helm chart is gone, or owned false, addon is in AddonDisabling status, move to init state"

@w13915984028 I am curious why you don't see any of the events from the helm controller itself when you delete the chart. Are you filtering these out? What events and log messages do you see?

Rather than adding code to Harvester to manually clean up after helm-controller, would you mind making an attempt to fix the unwanted behavior here? It should be somewhere in the OnRemove function at https://github.com/k3s-io/helm-controller/blob/master/pkg/controllers/chart/chart.go#L201.

@brandond We meet the same problem, and it's real weird that HelmChart delete seems triggers 2 times of delete job.
Here is our environment version:
k3s version v1.28.4+k3s2 (6ba6c1b6)
go version go1.20.11
Did Following test,command ,log event and jobs:
1.First Round:
Apply the test helmchart.

[root@localhost k8s_env]# kubectl apply -f run-test.yaml 
helmchart.helm.cattle.io/test created

Check the helm-install job ,event logs and pod resource

[root@localhost k8s_env]# kubectl get job -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
kube-system   helm-install-test                 0/1           9s         9s
[root@localhost k8s_env]# kubectl get event -A
kube-system            4s          Normal    SuccessfulCreate                 job/helm-install-test                               Created pod: helm-install-test-5v2kp
kube-system            4s          Normal    Scheduled                        pod/helm-install-test-5v2kp                         Successfully assigned kube-system/helm-install-test-5v2kp to localhost
kube-system            4s          Normal    ApplyJob                         helmchart/test                                      Applying HelmChart using Job kube-system/helm-install-test
kube-system            4s          Normal    AddedInterface                   pod/helm-install-test-5v2kp                         Add eth0 [10.42.2.123/16] from cnibr
kube-system            4s          Normal    Pulled                           pod/helm-install-test-5v2kp                         Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            4s          Normal    Created                          pod/helm-install-test-5v2kp                         Created container helm
kube-system            4s          Normal    Started                          pod/helm-install-test-5v2kp                         Started container helm
default                3s          Normal    ScalingReplicaSet                deployment/test                                     Scaled up replica set test-9f64fdb7f to 1
default                3s          Normal    SuccessfulCreate                 replicaset/test-9f64fdb7f                           Created pod: test-9f64fdb7f-smf22
default                3s          Normal    Scheduled                        pod/test-9f64fdb7f-smf22                            Successfully assigned default/test-9f64fdb7f-smf22 to localhost
default                3s          Normal    AddedInterface                   pod/test-9f64fdb7f-smf22                            Add eth0 [10.42.2.124/16] from cnibr
default                3s          Normal    Pulled                           pod/test-9f64fdb7f-smf22                            Container image "test" already present on machine
default                3s          Normal    Created                          pod/test-9f64fdb7f-smf22                            Created container test
default                3s          Normal    Started                          pod/test-9f64fdb7f-smf22                            Started container test
kube-system            1s          Normal    Completed                        job/helm-install-test                               Job completed
[root@localhost k8s_env]# kubectl get pod 
NAME                   READY   STATUS    RESTARTS   AGE
test-9f64fdb7f-smf22   1/1     Running   0          3m13s

Delete helmchart after pod running and check event logs and job.

[root@localhost k8s_env]# kubectl delete -f run-test.yaml 
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod 
NAME                   READY   STATUS        RESTARTS   AGE
test-9f64fdb7f-smf22   1/1     Terminating   0          4m50s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
kube-system   helm-delete-test                  1/1           3s         15s|
[root@localhost k8s_env]# kubectl get event -A
kube-system            61s         Normal    SuccessfulCreate                 job/helm-delete-test                                Created pod: helm-delete-test-49284
kube-system            60s         Normal    Scheduled                        pod/helm-delete-test-49284                          Successfully assigned kube-system/helm-delete-test-49284 to localhost
kube-system            61s         Normal    AddedInterface                   pod/helm-delete-test-49284                          Add eth0 [10.42.2.126/16] from cnibr
kube-system            60s         Normal    Pulled                           pod/helm-delete-test-49284                          Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            60s         Normal    Created                          pod/helm-delete-test-49284                          Created container helm
kube-system            60s         Normal    Started                          pod/helm-delete-test-49284                          Started container helm
default                60s         Normal    Killing                          pod/test-9f64fdb7f-smf22                            Stopping container test
kube-system            58s         Normal    Completed                        job/helm-delete-test                                Job completed
kube-system            55s         Normal    RemoveJob                        helmchart/test                                      Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
kube-system            55s         Normal    SuccessfulCreate                 job/helm-delete-test                                Created pod: helm-delete-test-vj4b2
kube-system            54s         Normal    Scheduled                        pod/helm-delete-test-vj4b2                          Successfully assigned kube-system/helm-delete-test-vj4b2 to localhost
kube-system            55s         Normal    AddedInterface                   pod/helm-delete-test-vj4b2                          Add eth0 [10.42.2.127/16] from cnibr
kube-system            54s         Normal    Pulled                           pod/helm-delete-test-vj4b2                          Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            54s         Normal    Created                          pod/helm-delete-test-vj4b2                          Created container helm
kube-system            54s         Normal    Started                          pod/helm-delete-test-vj4b2                          Started container helm
kube-system            52s         Normal    Completed                        job/helm-delete-test                                Job completed
[root@localhost k8s_env]# kubectl get pod 
No resources found in default namespace.

From the log events and job , we find after the first delete job completed and removed ,k3s start a second delete job without removed,so there is a delete job left.

2.Second Round:

[root@localhost k8s_env]# kubectl apply -f run-test.yaml 
helmchart.helm.cattle.io/test created
[root@localhost k8s_env]# kubectl get pod
NAME                   READY   STATUS    RESTARTS   AGE
test-9f64fdb7f-wbhrd   1/1     Running   0          5s
[root@localhost k8s_env]# kubectl get jobs.batch -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
kube-system   helm-delete-test                  1/1           3s         8m20s
kube-system   helm-install-test                 1/1           3s         13s
[root@localhost k8s_env]# kubectl get event -A
kube-system            48s         Normal    SuccessfulCreate                 job/helm-install-test                               Created pod: helm-install-test-9jkgj
kube-system            47s         Normal    Scheduled                        pod/helm-install-test-9jkgj                         Successfully assigned kube-system/helm-install-test-9jkgj to localhost
kube-system            48s         Normal    ApplyJob                         helmchart/test                                      Applying HelmChart using Job kube-system/helm-install-test
kube-system            47s         Normal    AddedInterface                   pod/helm-install-test-9jkgj                         Add eth0 [10.42.2.128/16] from cnibr
kube-system            47s         Normal    Pulled                           pod/helm-install-test-9jkgj                         Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            47s         Normal    Created                          pod/helm-install-test-9jkgj                         Created container helm
kube-system            47s         Normal    Started                          pod/helm-install-test-9jkgj                         Started container helm
default                47s         Normal    ScalingReplicaSet                deployment/test                                     Scaled up replica set test-9f64fdb7f to 1
default                47s         Normal    SuccessfulCreate                 replicaset/test-9f64fdb7f                           Created pod: test-9f64fdb7f-wbhrd
default                46s         Normal    Scheduled                        pod/test-9f64fdb7f-wbhrd                            Successfully assigned default/test-9f64fdb7f-wbhrd to localhost
default                46s         Normal    AddedInterface                   pod/test-9f64fdb7f-wbhrd                            Add eth0 [10.42.2.129/16] from cnibr
default                46s         Normal    Pulled                           pod/test-9f64fdb7f-wbhrd                            Container image "test" already present on machine
default                46s         Normal    Created                          pod/test-9f64fdb7f-wbhrd                            Created container test
default                46s         Normal    Started                          pod/test-9f64fdb7f-wbhrd                            Started container test
kube-system            45s         Normal    Completed                        job/helm-install-test                               Job completed
[root@localhost k8s_env]# kubectl delete -f run-test.yaml 
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod 
NAME                   READY   STATUS    RESTARTS   AGE
test-9f64fdb7f-wbhrd   1/1     Running   0          3m6s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
[root@localhost k8s_env]# kubectl get event -A
kube-system            46s         Normal    RemoveJob                        helmchart/test                                      Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources

In the second round , helm-delete process is just remove the left job after first round,but the pod is still running.

3.Third Round:
we want check more helm-controller debug logs,so we set disable-helm-controller arg,and download helm controller process with version v0.15.4 which is the same version used in k3s version v1.28.4 .After we run the program manually and do the same test,we get the different result.

[root@localhost k8s_env]# vim /usr/lib/systemd/system/k3s.service
[root@localhost k8s_env]# systemctl daemon-reload 
[root@localhost k8s_env]# systemctl restart k3s
[root@localhost helm-controller]# ./helm-controller-amd64 --kubeconfig /etc/rancher/k3s/k3s.yaml &
[root@localhost k8s_env]# kubectl apply -f run-test.yaml 
helmchart.helm.cattle.io/test created
[root@localhost k8s_env]# kubectl get pod 
NAME                   READY   STATUS    RESTARTS   AGE
test-9f64fdb7f-cjlxc   1/1     Running   0          3s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
kube-system   helm-install-test                 1/1           4s         11s
[root@localhost k8s_env]# kubectl get event -A
kube-system            49s         Normal    SuccessfulCreate                 job/helm-install-test                               Created pod: helm-install-test-tmmgm
kube-system            48s         Normal    Scheduled                        pod/helm-install-test-tmmgm                         Successfully assigned kube-system/helm-install-test-tmmgm to localhost
kube-system            49s         Normal    AddedInterface                   pod/helm-install-test-tmmgm                         Add eth0 [10.42.2.144/16] from cnibr
kube-system            48s         Normal    Pulled                           pod/helm-install-test-tmmgm                         Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            48s         Normal    Created                          pod/helm-install-test-tmmgm                         Created container helm
kube-system            48s         Normal    Started                          pod/helm-install-test-tmmgm                         Started container helm
default                48s         Normal    ScalingReplicaSet                deployment/test                                     Scaled up replica set test-9f64fdb7f to 1
default                48s         Normal    SuccessfulCreate                 replicaset/test-9f64fdb7f                           Created pod: test-9f64fdb7f-cjlxc
default                47s         Normal    Scheduled                        pod/test-9f64fdb7f-cjlxc                            Successfully assigned default/test-9f64fdb7f-cjlxc to localhost
default                47s         Normal    AddedInterface                   pod/test-9f64fdb7f-cjlxc                            Add eth0 [10.42.2.145/16] from cnibr
default                47s         Normal    Pulled                           pod/test-9f64fdb7f-cjlxc                            Container image "test" already present on machine
default                47s         Normal    Created                          pod/test-9f64fdb7f-cjlxc                            Created container test
default                47s         Normal    Started                          pod/test-9f64fdb7f-cjlxc                            Started container test
kube-system            45s         Normal    Completed                        job/helm-install-test                               Job completed
kube-system            45s         Normal    ApplyJob                         helmchart/test                                      Applying HelmChart using Job kube-system/helm-install-test
[root@localhost k8s_env]# kubectl delete -f run-test.yaml 
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod 
NAME                   READY   STATUS        RESTARTS   AGE
test-9f64fdb7f-cjlxc   1/1     Terminating   0          2m45s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE     NAME                              COMPLETIONS   DURATION   AGE
[root@localhost k8s_env]# kubectl get event -A
kube-system            52s         Normal    SuccessfulCreate                 job/helm-delete-test                                Created pod: helm-delete-test-sjfpn
kube-system            51s         Normal    Scheduled                        pod/helm-delete-test-sjfpn                          Successfully assigned kube-system/helm-delete-test-sjfpn to localhost
kube-system            52s         Normal    AddedInterface                   pod/helm-delete-test-sjfpn                          Add eth0 [10.42.2.146/16] from cnibr
kube-system            51s         Normal    Pulled                           pod/helm-delete-test-sjfpn                          Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            51s         Normal    Created                          pod/helm-delete-test-sjfpn                          Created container helm
kube-system            51s         Normal    Started                          pod/helm-delete-test-sjfpn                          Started container helm
default                51s         Normal    Killing                          pod/test-9f64fdb7f-cjlxc                            Stopping container test
kube-system            49s         Normal    Completed                        job/helm-delete-test                                Job completed
kube-system            46s         Normal    SuccessfulCreate                 job/helm-delete-test                                Created pod: helm-delete-test-qq5zc
kube-system            45s         Normal    Scheduled                        pod/helm-delete-test-qq5zc                          Successfully assigned kube-system/helm-delete-test-qq5zc to localhost
kube-system            46s         Normal    AddedInterface                   pod/helm-delete-test-qq5zc                          Add eth0 [10.42.2.147/16] from cnibr
kube-system            45s         Normal    Pulled                           pod/helm-delete-test-qq5zc                          Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system            45s         Normal    Created                          pod/helm-delete-test-qq5zc                          Created container helm
kube-system            45s         Normal    Started                          pod/helm-delete-test-qq5zc                          Started container helm
kube-system            43s         Normal    Completed                        job/helm-delete-test                                Job completed
kube-system            40s         Normal    RemoveJob                        helmchart/test                                      Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources

Here, you can see the different solutions to the test. The helm-delete job has been cleaned up and the event logs show that after two jobs are completed, the RemoveJob is triggered.

Here is helm-controller logs:

[root@localhost helm-controller]# INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255465", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
ERRO[0863] error syncing 'kube-system/test': handler helm-controller-chart-registration: helmcharts.helm.cattle.io "test" not found, requeuing 
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255465", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
INFO[0866] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
INFO[0866] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test 
ERRO[1011] error syncing 'kube-system/test': handler on-helm-chart-remove: waiting for delete of helm chart for kube-system/test by helm-delete-test, requeuing 
INFO[1014] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255707", FieldPath:""}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources 
ERRO[1017] error syncing 'kube-system/test': handler on-helm-chart-remove: waiting for delete of helm chart for kube-system/test by helm-delete-test, requeuing 
INFO[1020] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255739", FieldPath:""}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources 

It's strange that the test with different solutions resulted in the following questions:
1.Why was the helm-delete job triggered twice?
2.Check the logic and logs of helm-controller, if a job is complete ,the RemoveJob will be triggered.Why the RemoveJob was triggered once in first round and second round test which the helm-delete job was triggered twice?

Any resolution in 2024?