The helm-delete job is not cleaned when helm release is deleted by helmchart CR
GonzoHsu opened this issue · 24 comments
When I run kubectl delete helmchart <name>
command, most of time I see the helm-delete- job is still existing, and it seems it is caused by duplicated helm-delete job created by helm-controller
For example, run kubectl delete helmchart nats
,
The job and pod for helm delete are existing,
# kubectl get job | grep nats
helm-delete-nats 1/1 3s 19m
# kubectl get pod | grep nats
helm-delete-nats-mn4xd 0/1 Completed 0 19m
The event show the pod for /helm-delete-nats generated twice
# kubectl get event | grep nats
22m Normal SuccessfulCreate job/helm-delete-nats Created pod: helm-delete-nats-7h84g
22m Normal Scheduled pod/helm-delete-nats-7h84g Successfully assigned default/helm-delete-nats-7h84g to 96e53ca49eb7e011eda21c000c290bc835
22m Normal Pulled pod/helm-delete-nats-7h84g Container image "rancher/klipper-helm:v0.7.4-build20221121" already present on machine
22m Normal Created pod/helm-delete-nats-7h84g Created container helm
22m Normal Started pod/helm-delete-nats-7h84g Started container helm
22m Normal Killing pod/nats-0 Stopping container nats
22m Normal Killing pod/nats-0 Stopping container metrics
22m Normal Killing pod/nats-0 Stopping container reloader
22m Warning CalculateExpectedPodCountFailed poddisruptionbudget/nats Failed to calculate the number of expected pods: statefulsets.apps does not implement the scale subresource
22m Normal Completed job/helm-delete-nats Job completed
22m Normal RemoveJob helmchart/nats Uninstalled HelmChart using Job default/helm-delete-nats, removing resources
21m Normal SuccessfulCreate job/helm-delete-nats Created pod: helm-delete-nats-mn4xd
21m Normal Scheduled pod/helm-delete-nats-mn4xd Successfully assigned default/helm-delete-nats-mn4xd to 96e53ca49eb7e011eda21c000c290bc835
21m Normal Pulled pod/helm-delete-nats-mn4xd Container image "rancher/klipper-helm:v0.7.4-build20221121" already present on machine
21m Normal Created pod/helm-delete-nats-mn4xd Created container helm
21m Normal Started pod/helm-delete-nats-mn4xd Started container helm
21m Normal Completed job/helm-delete-nats Job completed
And the existing pod logs are as following, I think it means the job does not find the helm release to delete.
# kubectl logs helm-delete-nats-mn4xd
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
echo "KUBERNETES_SERVICE_HOST is using IPv6"
CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi
set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ tiller --listen=127.0.0.1:44134 --storage=secret
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
[main] 2023/03/06 09:51:27 Starting Tiller v2.17.0 (tls=false)
[main] 2023/03/06 09:51:27 GRPC listening on 127.0.0.1:44134
[main] 2023/03/06 09:51:27 Probes listening on :44135
[main] 2023/03/06 09:51:27 Storage driver is Secret
[main] 2023/03/06 09:51:27 Max history per release is 0
Creating /home/klipper-helm/.helm
Creating /home/klipper-helm/.helm/repository
Creating /home/klipper-helm/.helm/repository/cache
Creating /home/klipper-helm/.helm/repository/local
Creating /home/klipper-helm/.helm/plugins
Creating /home/klipper-helm/.helm/starters
Creating /home/klipper-helm/.helm/cache/archive
Creating /home/klipper-helm/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable/
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ timeout -s KILL 30 helm_v2 ls --all '^nats$' --output json
++ jq -r '.Releases | length'
[storage] 2023/03/06 09:51:27 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/nats.tgz.base64
+ CHART_PATH=/tmp/nats.tgz
+ [[ ! -f /chart/nats.tgz.base64 ]]
+ base64 -d /chart/nats.tgz.base64
+ CHART=/tmp/nats.tgz
+ set +e
+ [[ delete != \d\e\l\e\t\e ]]
+ helm_update delete
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^nats$' --namespace default --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
+ LINE=null,null
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ delete = \d\e\l\e\t\e ]]
+ [[ -z null ]]
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
+ echo 'Uninstalling helm_v3 chart'
+ helm_v3 uninstall nats --namespace default
Error: uninstall: Release not loaded: nats: release: not found
+ true
+ exit
Yep, we are also experiencing the same behavior.
I'm unable to replicate this. What version of k3s or helm-controller are you on? The code here should delete any objects owned by the HelmChart:
helm-controller/pkg/controllers/chart/chart.go
Lines 253 to 265 in 57fde46
The k3s version I used is:
k3s version v1.23.16+k3s1 (64b0feeb)
This condition will not happen every time, if you tried replicate it in a short time slot, you may not able to find it.
I also check the code and I think the following part cause that, but I am not sure what timing can cause that.
helm-controller/pkg/controllers/chart/chart.go
Lines 215 to 220 in 57fde46
When the condition happened, not only the delete job, but also some configmaps of helmchart is created again.
This condition will not happen every time, if you tried replicate it in a short time slot, you may not able to find it.
I've run the end to end tests quite a few times and not been able to reproduce it; do you have any circumstances or specific steps that seem to contribute to it? Deleting the HelmChart too quickly after creating it, deleting the namespace before the HelmChart, or so on?
My circumstance is deploying a Rocky Linux VM on VMware ESXi server with 2 vCPU and 8G RAM, then install k3s on the VM.
I didn't do specific steps to create/delete the HelmChart, just use command "kubectl apply/delete -f <HelmChart.yaml>"
How long did you wait between applying and deleting it? Did the install succeed? Was it still mid-install when deleted?
The install is succeed, I can see my pod runs without problem, not in mid-install state.
The time between applying and deleting can be few minutes to few days, I didn't find relation about it.
I suspect we encounter similar issue in Harvester.
When create/delete the same HelmChart
(only deployment, not CRD) frequently, there are bugs:
The previous left delete job, will be picked by the next time re-installed helmchart
, when which is deleted again.
harv41:/home/rancher # kk get jobs -A && kk get jobs -n cattle-monitoring-system helm-delete-rancher-monitoring -oyaml
NAMESPACE NAME COMPLETIONS DURATION AGE
cattle-monitoring-system helm-delete-rancher-monitoring 1/1 4s 3m41s
kube-system helm-install-rke2-canal 1/1 12s 7d2h
status:
completionTime: "2023-06-29T10:31:52Z"
conditions:
- lastProbeTime: "2023-06-29T10:31:52Z"
lastTransitionTime: "2023-06-29T10:31:52Z"
status: "True"
type: Complete
ready: 0
startTime: "2023-06-29T10:31:48Z"
succeeded: 1
uncountedTerminatedPods: {}
This bug, causes the HelmChart
related downstream deployments are still left there.
Harvester addon is on top of HelmChart
, and we are adding workaround: each time when deploy
or delete
the HelmChart
, delete the potential previous job first.
@brandond
The JobName
of a chart is fixed, not generated, if previous job is not out-dated, it maybe re-picked next time, and assume the job is done quickly.
From my test in Harvester, it is highly possible due to this, both install
and delete
.
harv41:/home/rancher # kk get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
cattle-monitoring-system helm-install-rancher-monitoring 1/1 52s 3m13s
One HelmChart
delete seems triggers 2 times of delete job
, the HelmChart
is gone, but one job is left, and it will affect next round delete.
Did following test, log events and jobs:
First round:
1.1 create HelmChart
cattle-monitoring-system 13m Normal SuccessfulCreate job/helm-install-rancher-monitoring Created pod: helm-install-rancher-monitoring-wnrm6
cattle-monitoring-system 13m Normal Completed job/helm-install-rancher-monitoring Job completed
1.2 delete Helmchart
cattle-monitoring-system 12m Normal SuccessfulCreate job/helm-delete-rancher-monitoring Created pod: helm-delete-rancher-monitoring-d2n62
cattle-monitoring-system 12m Normal Completed job/helm-delete-rancher-monitoring Job completed
cattle-monitoring-system 12m Normal SuccessfulCreate job/helm-delete-rancher-monitoring Created pod: helm-delete-rancher-monitoring-7lkj9
cattle-monitoring-system 12m Normal Completed job/helm-delete-rancher-monitoring Job completed
1.3 2 jobs are left:
NAMESPACE NAME COMPLETIONS DURATION AGE
cattle-monitoring-system helm-delete-rancher-monitoring 1/1 4s 9m48s
cattle-monitoring-system helm-install-rancher-monitoring 1/1 39s 47s
Second round:
2.1 create HelmChart
cattle-monitoring-system 3m10s Normal SuccessfulCreate job/helm-install-rancher-monitoring Created pod: helm-install-rancher-monitoring-b2kcz
cattle-monitoring-system 2m31s Normal Completed job/helm-install-rancher-monitoring Job completed
2.2 delete Helmchart
cattle-monitoring-system 117s Normal SuccessfulCreate job/helm-delete-rancher-monitoring Created pod: helm-delete-rancher-monitoring-5dvlw
cattle-monitoring-system 107s Normal Completed job/helm-delete-rancher-monitoring Job completed
cattle-monitoring-system 105s Normal SuccessfulCreate job/helm-delete-rancher-monitoring Created pod: helm-delete-rancher-monitoring-8dfh9
cattle-monitoring-system 101s Normal Completed job/helm-delete-rancher-monitoring Job completed
2.3 1 job is left
NAMESPACE NAME COMPLETIONS DURATION AGE
cattle-monitoring-system helm-delete-rancher-monitoring 1/1 4s 2m57s
With a workaround in Harvester to forcely delete the job before trigger HelmChart
action, we can avoid such issue now.
harvester/harvester#4127
time="2023-06-29T14:56:52Z" level=info msg="OnChange: user disable addon, move from AddonDeploySuccessful to new disable status AddonDisabling"
///// before delete HelmChart, delete those 2 left jobs
time="2023-06-29T14:56:52Z" level=info msg="previous job cattle-monitoring-system/helm-delete-rancher-monitoring is to be deleted, wait"
W0629 14:56:52.802777 7 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
time="2023-06-29T14:56:52Z" level=info msg="previous job cattle-monitoring-system/helm-install-rancher-monitoring is to be deleted, wait"
W0629 14:56:52.813165 7 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
time="2023-06-29T14:56:57Z" level=info msg="delete the helm chart cattle-monitoring-system/rancher-monitoring"
...
time="2023-06-29T14:57:10Z" level=info msg="addon rancher-monitoring: helm chart is gone, or owned false, addon is in AddonDisabling status, move to init state"
@w13915984028 I am curious why you don't see any of the events from the helm controller itself when you delete the chart. Are you filtering these out? What events and log messages do you see?
Rather than adding code to Harvester to manually clean up after helm-controller, would you mind making an attempt to fix the unwanted behavior here? It should be somewhere in the OnRemove function at https://github.com/k3s-io/helm-controller/blob/master/pkg/controllers/chart/chart.go#L201.
@brandond We meet the same problem, and it's real weird that HelmChart delete seems triggers 2 times of delete job.
Here is our environment version:
k3s version v1.28.4+k3s2 (6ba6c1b6)
go version go1.20.11
Did Following test,command ,log event and jobs:
1.First Round:
Apply the test helmchart.
[root@localhost k8s_env]# kubectl apply -f run-test.yaml
helmchart.helm.cattle.io/test created
Check the helm-install job ,event logs and pod resource
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
kube-system helm-install-test 0/1 9s 9s
[root@localhost k8s_env]# kubectl get event -A
kube-system 4s Normal SuccessfulCreate job/helm-install-test Created pod: helm-install-test-5v2kp
kube-system 4s Normal Scheduled pod/helm-install-test-5v2kp Successfully assigned kube-system/helm-install-test-5v2kp to localhost
kube-system 4s Normal ApplyJob helmchart/test Applying HelmChart using Job kube-system/helm-install-test
kube-system 4s Normal AddedInterface pod/helm-install-test-5v2kp Add eth0 [10.42.2.123/16] from cnibr
kube-system 4s Normal Pulled pod/helm-install-test-5v2kp Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 4s Normal Created pod/helm-install-test-5v2kp Created container helm
kube-system 4s Normal Started pod/helm-install-test-5v2kp Started container helm
default 3s Normal ScalingReplicaSet deployment/test Scaled up replica set test-9f64fdb7f to 1
default 3s Normal SuccessfulCreate replicaset/test-9f64fdb7f Created pod: test-9f64fdb7f-smf22
default 3s Normal Scheduled pod/test-9f64fdb7f-smf22 Successfully assigned default/test-9f64fdb7f-smf22 to localhost
default 3s Normal AddedInterface pod/test-9f64fdb7f-smf22 Add eth0 [10.42.2.124/16] from cnibr
default 3s Normal Pulled pod/test-9f64fdb7f-smf22 Container image "test" already present on machine
default 3s Normal Created pod/test-9f64fdb7f-smf22 Created container test
default 3s Normal Started pod/test-9f64fdb7f-smf22 Started container test
kube-system 1s Normal Completed job/helm-install-test Job completed
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-smf22 1/1 Running 0 3m13s
Delete helmchart after pod running and check event logs and job.
[root@localhost k8s_env]# kubectl delete -f run-test.yaml
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-smf22 1/1 Terminating 0 4m50s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
kube-system helm-delete-test 1/1 3s 15s|
[root@localhost k8s_env]# kubectl get event -A
kube-system 61s Normal SuccessfulCreate job/helm-delete-test Created pod: helm-delete-test-49284
kube-system 60s Normal Scheduled pod/helm-delete-test-49284 Successfully assigned kube-system/helm-delete-test-49284 to localhost
kube-system 61s Normal AddedInterface pod/helm-delete-test-49284 Add eth0 [10.42.2.126/16] from cnibr
kube-system 60s Normal Pulled pod/helm-delete-test-49284 Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 60s Normal Created pod/helm-delete-test-49284 Created container helm
kube-system 60s Normal Started pod/helm-delete-test-49284 Started container helm
default 60s Normal Killing pod/test-9f64fdb7f-smf22 Stopping container test
kube-system 58s Normal Completed job/helm-delete-test Job completed
kube-system 55s Normal RemoveJob helmchart/test Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
kube-system 55s Normal SuccessfulCreate job/helm-delete-test Created pod: helm-delete-test-vj4b2
kube-system 54s Normal Scheduled pod/helm-delete-test-vj4b2 Successfully assigned kube-system/helm-delete-test-vj4b2 to localhost
kube-system 55s Normal AddedInterface pod/helm-delete-test-vj4b2 Add eth0 [10.42.2.127/16] from cnibr
kube-system 54s Normal Pulled pod/helm-delete-test-vj4b2 Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 54s Normal Created pod/helm-delete-test-vj4b2 Created container helm
kube-system 54s Normal Started pod/helm-delete-test-vj4b2 Started container helm
kube-system 52s Normal Completed job/helm-delete-test Job completed
[root@localhost k8s_env]# kubectl get pod
No resources found in default namespace.
From the log events and job , we find after the first delete job completed and removed ,k3s start a second delete job without removed,so there is a delete job left.
2.Second Round:
[root@localhost k8s_env]# kubectl apply -f run-test.yaml
helmchart.helm.cattle.io/test created
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-wbhrd 1/1 Running 0 5s
[root@localhost k8s_env]# kubectl get jobs.batch -A
NAMESPACE NAME COMPLETIONS DURATION AGE
kube-system helm-delete-test 1/1 3s 8m20s
kube-system helm-install-test 1/1 3s 13s
[root@localhost k8s_env]# kubectl get event -A
kube-system 48s Normal SuccessfulCreate job/helm-install-test Created pod: helm-install-test-9jkgj
kube-system 47s Normal Scheduled pod/helm-install-test-9jkgj Successfully assigned kube-system/helm-install-test-9jkgj to localhost
kube-system 48s Normal ApplyJob helmchart/test Applying HelmChart using Job kube-system/helm-install-test
kube-system 47s Normal AddedInterface pod/helm-install-test-9jkgj Add eth0 [10.42.2.128/16] from cnibr
kube-system 47s Normal Pulled pod/helm-install-test-9jkgj Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 47s Normal Created pod/helm-install-test-9jkgj Created container helm
kube-system 47s Normal Started pod/helm-install-test-9jkgj Started container helm
default 47s Normal ScalingReplicaSet deployment/test Scaled up replica set test-9f64fdb7f to 1
default 47s Normal SuccessfulCreate replicaset/test-9f64fdb7f Created pod: test-9f64fdb7f-wbhrd
default 46s Normal Scheduled pod/test-9f64fdb7f-wbhrd Successfully assigned default/test-9f64fdb7f-wbhrd to localhost
default 46s Normal AddedInterface pod/test-9f64fdb7f-wbhrd Add eth0 [10.42.2.129/16] from cnibr
default 46s Normal Pulled pod/test-9f64fdb7f-wbhrd Container image "test" already present on machine
default 46s Normal Created pod/test-9f64fdb7f-wbhrd Created container test
default 46s Normal Started pod/test-9f64fdb7f-wbhrd Started container test
kube-system 45s Normal Completed job/helm-install-test Job completed
[root@localhost k8s_env]# kubectl delete -f run-test.yaml
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-wbhrd 1/1 Running 0 3m6s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
[root@localhost k8s_env]# kubectl get event -A
kube-system 46s Normal RemoveJob helmchart/test Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
In the second round , helm-delete process is just remove the left job after first round,but the pod is still running.
3.Third Round:
we want check more helm-controller debug logs,so we set disable-helm-controller arg,and download helm controller process with version v0.15.4 which is the same version used in k3s version v1.28.4 .After we run the program manually and do the same test,we get the different result.
[root@localhost k8s_env]# vim /usr/lib/systemd/system/k3s.service
[root@localhost k8s_env]# systemctl daemon-reload
[root@localhost k8s_env]# systemctl restart k3s
[root@localhost helm-controller]# ./helm-controller-amd64 --kubeconfig /etc/rancher/k3s/k3s.yaml &
[root@localhost k8s_env]# kubectl apply -f run-test.yaml
helmchart.helm.cattle.io/test created
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-cjlxc 1/1 Running 0 3s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
kube-system helm-install-test 1/1 4s 11s
[root@localhost k8s_env]# kubectl get event -A
kube-system 49s Normal SuccessfulCreate job/helm-install-test Created pod: helm-install-test-tmmgm
kube-system 48s Normal Scheduled pod/helm-install-test-tmmgm Successfully assigned kube-system/helm-install-test-tmmgm to localhost
kube-system 49s Normal AddedInterface pod/helm-install-test-tmmgm Add eth0 [10.42.2.144/16] from cnibr
kube-system 48s Normal Pulled pod/helm-install-test-tmmgm Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 48s Normal Created pod/helm-install-test-tmmgm Created container helm
kube-system 48s Normal Started pod/helm-install-test-tmmgm Started container helm
default 48s Normal ScalingReplicaSet deployment/test Scaled up replica set test-9f64fdb7f to 1
default 48s Normal SuccessfulCreate replicaset/test-9f64fdb7f Created pod: test-9f64fdb7f-cjlxc
default 47s Normal Scheduled pod/test-9f64fdb7f-cjlxc Successfully assigned default/test-9f64fdb7f-cjlxc to localhost
default 47s Normal AddedInterface pod/test-9f64fdb7f-cjlxc Add eth0 [10.42.2.145/16] from cnibr
default 47s Normal Pulled pod/test-9f64fdb7f-cjlxc Container image "test" already present on machine
default 47s Normal Created pod/test-9f64fdb7f-cjlxc Created container test
default 47s Normal Started pod/test-9f64fdb7f-cjlxc Started container test
kube-system 45s Normal Completed job/helm-install-test Job completed
kube-system 45s Normal ApplyJob helmchart/test Applying HelmChart using Job kube-system/helm-install-test
[root@localhost k8s_env]# kubectl delete -f run-test.yaml
helmchart.helm.cattle.io "test" deleted
[root@localhost k8s_env]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-9f64fdb7f-cjlxc 1/1 Terminating 0 2m45s
[root@localhost k8s_env]# kubectl get job -A
NAMESPACE NAME COMPLETIONS DURATION AGE
[root@localhost k8s_env]# kubectl get event -A
kube-system 52s Normal SuccessfulCreate job/helm-delete-test Created pod: helm-delete-test-sjfpn
kube-system 51s Normal Scheduled pod/helm-delete-test-sjfpn Successfully assigned kube-system/helm-delete-test-sjfpn to localhost
kube-system 52s Normal AddedInterface pod/helm-delete-test-sjfpn Add eth0 [10.42.2.146/16] from cnibr
kube-system 51s Normal Pulled pod/helm-delete-test-sjfpn Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 51s Normal Created pod/helm-delete-test-sjfpn Created container helm
kube-system 51s Normal Started pod/helm-delete-test-sjfpn Started container helm
default 51s Normal Killing pod/test-9f64fdb7f-cjlxc Stopping container test
kube-system 49s Normal Completed job/helm-delete-test Job completed
kube-system 46s Normal SuccessfulCreate job/helm-delete-test Created pod: helm-delete-test-qq5zc
kube-system 45s Normal Scheduled pod/helm-delete-test-qq5zc Successfully assigned kube-system/helm-delete-test-qq5zc to localhost
kube-system 46s Normal AddedInterface pod/helm-delete-test-qq5zc Add eth0 [10.42.2.147/16] from cnibr
kube-system 45s Normal Pulled pod/helm-delete-test-qq5zc Container image "rancher/klipper-helm:v0.8.2-build20230815" already present on machine
kube-system 45s Normal Created pod/helm-delete-test-qq5zc Created container helm
kube-system 45s Normal Started pod/helm-delete-test-qq5zc Started container helm
kube-system 43s Normal Completed job/helm-delete-test Job completed
kube-system 40s Normal RemoveJob helmchart/test Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
Here, you can see the different solutions to the test. The helm-delete job has been cleaned up and the event logs show that after two jobs are completed, the RemoveJob is triggered.
Here is helm-controller logs:
[root@localhost helm-controller]# INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255465", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
ERRO[0863] error syncing 'kube-system/test': handler helm-controller-chart-registration: helmcharts.helm.cattle.io "test" not found, requeuing
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255465", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
INFO[0863] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
INFO[0866] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
INFO[0866] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255472", FieldPath:""}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-test
ERRO[1011] error syncing 'kube-system/test': handler on-helm-chart-remove: waiting for delete of helm chart for kube-system/test by helm-delete-test, requeuing
INFO[1014] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255707", FieldPath:""}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
ERRO[1017] error syncing 'kube-system/test': handler on-helm-chart-remove: waiting for delete of helm chart for kube-system/test by helm-delete-test, requeuing
INFO[1020] Event(v1.ObjectReference{Kind:"HelmChart", Namespace:"kube-system", Name:"test", UID:"cd455f26-15c0-4318-85e7-d34b06aa4f04", APIVersion:"helm.cattle.io/v1", ResourceVersion:"255739", FieldPath:""}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job kube-system/helm-delete-test, removing resources
It's strange that the test with different solutions resulted in the following questions:
1.Why was the helm-delete job triggered twice?
2.Check the logic and logs of helm-controller, if a job is complete ,the RemoveJob will be triggered.Why the RemoveJob was triggered once in first round and second round test which the helm-delete job was triggered twice?
Any resolution in 2024?