Helm Controller is constantly requeuing deployments

Question

Helm Controller is constantly requeuing deployments

pr0ton11 opened this issue a year ago · 8 comments

Hi

Using version 0.13.2 here and running into the issue that the helm controller is constantly redeploying the jobs even when no HelmChart changes happen:

time="2023-04-08T21:37:23Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"gitea\", UID:\"c7440515-b8a0-415f-aaeb-0188c3c1bd75\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"71794639\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-gitea"
time="2023-04-08T21:37:23Z" level=error msg="error syncing 'kube-system/gitea': handler helm-controller-chart-registration: helmcharts.helm.cattle.io \"gitea\" not found, requeuing"

I use this to deploy to k3s 1.26.3:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: helmcharts.helm.cattle.io
spec:
  group: helm.cattle.io
  names:
    kind: HelmChart
    plural: helmcharts
    singular: helmchart
  preserveUnknownFields: false
  scope: Namespaced
  versions:
  - additionalPrinterColumns:
    - jsonPath: .status.jobName
      name: Job
      type: string
    - jsonPath: .spec.chart
      name: Chart
      type: string
    - jsonPath: .spec.targetNamespace
      name: TargetNamespace
      type: string
    - jsonPath: .spec.version
      name: Version
      type: string
    - jsonPath: .spec.repo
      name: Repo
      type: string
    - jsonPath: .spec.helmVersion
      name: HelmVersion
      type: string
    - jsonPath: .spec.bootstrap
      name: Bootstrap
      type: string
    name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              bootstrap:
                type: boolean
              chart:
                nullable: true
                type: string
              chartContent:
                nullable: true
                type: string
              failurePolicy:
                nullable: true
                type: string
              helmVersion:
                nullable: true
                type: string
              jobImage:
                nullable: true
                type: string
              repo:
                nullable: true
                type: string
              repoCA:
                nullable: true
                type: string
              set:
                additionalProperties:
                  nullable: true
                  type: string
                nullable: true
                type: object
              targetNamespace:
                nullable: true
                type: string
              timeout:
                nullable: true
                type: string
              valuesContent:
                nullable: true
                type: string
              version:
                nullable: true
                type: string
            type: object
          status:
            properties:
              jobName:
                nullable: true
                type: string
            type: object
        type: object
    served: true
    storage: true

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: helmchartconfigs.helm.cattle.io
spec:
  group: helm.cattle.io
  names:
    kind: HelmChartConfig
    plural: helmchartconfigs
    singular: helmchartconfig
  preserveUnknownFields: false
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              failurePolicy:
                nullable: true
                type: string
              valuesContent:
                nullable: true
                type: string
            type: object
        type: object
    served: true
    storage: true

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: helm-controller
  name: helm-controller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: helm-controller
  name: helm-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: helm-controller
  namespace: kube-system

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: helm-controller
  namespace: kube-system
  labels:
    app: helm-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: helm-controller
  template:
    metadata:
      labels:
        app: helm-controller
    spec:
      containers:
        - name: helm-controller
          image: rancher/helm-controller:v0.13.2
          command: ["helm-controller"]
      enableServiceLinks: false
      serviceAccountName: helm-controller

The reason why I deploy the helm controller to k3s is because since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore. I might open a seperate issue in k3s, but for now this should do the job.

Answer 1 · 2023-04-08T22:06:01.000Z

since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore.

That seems like a more important issue, can you open an issue in the k3s repo and provide additional information on what's going on with that?

Can you tell from the logs (perhaps --debug logs) why it's re-enqueueing things endlessly? The NotFound error suggests that the informer cache is empty, and has been reported in other issues but never completed root-caused.

Answer 2 · 2023-04-08T22:10:10.000Z

I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.

Answer 3 · 2023-04-10T08:09:12.000Z

I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.

This is done with the following line in k3s config.yaml file:

# Disabled components
disable:
    - servicelb
    - traefik
    - metrics-server
    - helm

Let me know if I should do it with the option

How can I enable the debug option in helm-controller.

The k3s issue will be in a 2nd step, for now it's just important that it works, because I need the helm deployments (critical for networking)

Answer 4 · 2023-04-10T18:55:23.000Z

disable:
   - servicelb
   - traefik
   - metrics-server
   - helm

no, that's not right. It is --disable-helm-controller as I said - or disable-helm-controller: true if you're using a config file.

How can I enable the debug option in helm-controller.

adding the --debug flag to the command line args should do it?

Answer 5 · 2023-04-11T13:33:41.000Z

Thanks you,

Using the disable-helm-controller: true in the k3s.yaml file fixed the issue. I guess the 2 controllers fought each other in state.
I also upgraded to 0.13.3 now and it seems to run stable.

I have the feeling that my etcd is somewhat borked (why the k3s internal controller doesn't work anymore), but since it's a production cluster, I can not easily revert to the embedded helm controller and I would rather have it externally and properly loggable.

Another question would be if I can somehow tolerate the node-not-ready condition, because I am deploying the CNI with Helm.
How is this done within k3s?

Thanks for the nice and fast help.

Answer 6 · 2023-04-11T18:04:47.000Z

Another question would be if I can somehow tolerate the node-not-ready condition

For the controller itself, or for the job pods? If you're talking about the job pods, that's what the bootstrap field in the HelmChart spec is for.

https://docs.k3s.io/helm#helmchart-field-definitions

Answer 7 · 2023-04-13T05:46:14.000Z

For the controller, because it will need to spawn before the CNI plugin can spawn

Answer 8 · 2023-04-13T05:57:48.000Z

You would need to add a NotReady toleration to the example deployment, set it up to run with host network, and add node selectors that require it to run on control-plane node so that it can talk to the local apiserver.