Helm Controller is constantly requeuing deployments
pr0ton11 opened this issue · 8 comments
Hi
Using version 0.13.2 here and running into the issue that the helm controller is constantly redeploying the jobs even when no HelmChart changes happen:
time="2023-04-08T21:37:23Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"gitea\", UID:\"c7440515-b8a0-415f-aaeb-0188c3c1bd75\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"71794639\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-gitea"
time="2023-04-08T21:37:23Z" level=error msg="error syncing 'kube-system/gitea': handler helm-controller-chart-registration: helmcharts.helm.cattle.io \"gitea\" not found, requeuing"
I use this to deploy to k3s 1.26.3:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: helmcharts.helm.cattle.io
spec:
group: helm.cattle.io
names:
kind: HelmChart
plural: helmcharts
singular: helmchart
preserveUnknownFields: false
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .status.jobName
name: Job
type: string
- jsonPath: .spec.chart
name: Chart
type: string
- jsonPath: .spec.targetNamespace
name: TargetNamespace
type: string
- jsonPath: .spec.version
name: Version
type: string
- jsonPath: .spec.repo
name: Repo
type: string
- jsonPath: .spec.helmVersion
name: HelmVersion
type: string
- jsonPath: .spec.bootstrap
name: Bootstrap
type: string
name: v1
schema:
openAPIV3Schema:
properties:
spec:
properties:
bootstrap:
type: boolean
chart:
nullable: true
type: string
chartContent:
nullable: true
type: string
failurePolicy:
nullable: true
type: string
helmVersion:
nullable: true
type: string
jobImage:
nullable: true
type: string
repo:
nullable: true
type: string
repoCA:
nullable: true
type: string
set:
additionalProperties:
nullable: true
type: string
nullable: true
type: object
targetNamespace:
nullable: true
type: string
timeout:
nullable: true
type: string
valuesContent:
nullable: true
type: string
version:
nullable: true
type: string
type: object
status:
properties:
jobName:
nullable: true
type: string
type: object
type: object
served: true
storage: true
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: helmchartconfigs.helm.cattle.io
spec:
group: helm.cattle.io
names:
kind: HelmChartConfig
plural: helmchartconfigs
singular: helmchartconfig
preserveUnknownFields: false
scope: Namespaced
versions:
- name: v1
schema:
openAPIV3Schema:
properties:
spec:
properties:
failurePolicy:
nullable: true
type: string
valuesContent:
nullable: true
type: string
type: object
type: object
served: true
storage: true
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: helm-controller
name: helm-controller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: helm-controller
name: helm-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: helm-controller
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: helm-controller
namespace: kube-system
labels:
app: helm-controller
spec:
replicas: 1
selector:
matchLabels:
app: helm-controller
template:
metadata:
labels:
app: helm-controller
spec:
containers:
- name: helm-controller
image: rancher/helm-controller:v0.13.2
command: ["helm-controller"]
enableServiceLinks: false
serviceAccountName: helm-controller
The reason why I deploy the helm controller to k3s is because since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore. I might open a seperate issue in k3s, but for now this should do the job.
since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore.
That seems like a more important issue, can you open an issue in the k3s repo and provide additional information on what's going on with that?
Can you tell from the logs (perhaps --debug logs) why it's re-enqueueing things endlessly? The NotFound error suggests that the informer cache is empty, and has been reported in other issues but never completed root-caused.
I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.
I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.
This is done with the following line in k3s config.yaml file:
# Disabled components
disable:
- servicelb
- traefik
- metrics-server
- helm
Let me know if I should do it with the option
How can I enable the debug option in helm-controller.
The k3s issue will be in a 2nd step, for now it's just important that it works, because I need the helm deployments (critical for networking)
disable: - servicelb - traefik - metrics-server - helm
no, that's not right. It is --disable-helm-controller
as I said - or disable-helm-controller: true
if you're using a config file.
How can I enable the debug option in helm-controller.
adding the --debug
flag to the command line args should do it?
Thanks you,
Using the disable-helm-controller: true
in the k3s.yaml file fixed the issue. I guess the 2 controllers fought each other in state.
I also upgraded to 0.13.3 now and it seems to run stable.
I have the feeling that my etcd is somewhat borked (why the k3s internal controller doesn't work anymore), but since it's a production cluster, I can not easily revert to the embedded helm controller and I would rather have it externally and properly loggable.
Another question would be if I can somehow tolerate the node-not-ready condition, because I am deploying the CNI with Helm.
How is this done within k3s?
Thanks for the nice and fast help.
Another question would be if I can somehow tolerate the node-not-ready condition
For the controller itself, or for the job pods? If you're talking about the job pods, that's what the bootstrap field in the HelmChart spec is for.
For the controller, because it will need to spawn before the CNI plugin can spawn
You would need to add a NotReady toleration to the example deployment, set it up to run with host network, and add node selectors that require it to run on control-plane node so that it can talk to the local apiserver.