k3s-io/helm-controller

Helm Controller is constantly requeuing deployments

pr0ton11 opened this issue · 8 comments

Hi

Using version 0.13.2 here and running into the issue that the helm controller is constantly redeploying the jobs even when no HelmChart changes happen:

time="2023-04-08T21:37:23Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"gitea\", UID:\"c7440515-b8a0-415f-aaeb-0188c3c1bd75\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"71794639\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-gitea"
time="2023-04-08T21:37:23Z" level=error msg="error syncing 'kube-system/gitea': handler helm-controller-chart-registration: helmcharts.helm.cattle.io \"gitea\" not found, requeuing"

I use this to deploy to k3s 1.26.3:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: helmcharts.helm.cattle.io
spec:
  group: helm.cattle.io
  names:
    kind: HelmChart
    plural: helmcharts
    singular: helmchart
  preserveUnknownFields: false
  scope: Namespaced
  versions:
  - additionalPrinterColumns:
    - jsonPath: .status.jobName
      name: Job
      type: string
    - jsonPath: .spec.chart
      name: Chart
      type: string
    - jsonPath: .spec.targetNamespace
      name: TargetNamespace
      type: string
    - jsonPath: .spec.version
      name: Version
      type: string
    - jsonPath: .spec.repo
      name: Repo
      type: string
    - jsonPath: .spec.helmVersion
      name: HelmVersion
      type: string
    - jsonPath: .spec.bootstrap
      name: Bootstrap
      type: string
    name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              bootstrap:
                type: boolean
              chart:
                nullable: true
                type: string
              chartContent:
                nullable: true
                type: string
              failurePolicy:
                nullable: true
                type: string
              helmVersion:
                nullable: true
                type: string
              jobImage:
                nullable: true
                type: string
              repo:
                nullable: true
                type: string
              repoCA:
                nullable: true
                type: string
              set:
                additionalProperties:
                  nullable: true
                  type: string
                nullable: true
                type: object
              targetNamespace:
                nullable: true
                type: string
              timeout:
                nullable: true
                type: string
              valuesContent:
                nullable: true
                type: string
              version:
                nullable: true
                type: string
            type: object
          status:
            properties:
              jobName:
                nullable: true
                type: string
            type: object
        type: object
    served: true
    storage: true

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: helmchartconfigs.helm.cattle.io
spec:
  group: helm.cattle.io
  names:
    kind: HelmChartConfig
    plural: helmchartconfigs
    singular: helmchartconfig
  preserveUnknownFields: false
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              failurePolicy:
                nullable: true
                type: string
              valuesContent:
                nullable: true
                type: string
            type: object
        type: object
    served: true
    storage: true

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: helm-controller
  name: helm-controller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: helm-controller
  name: helm-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: helm-controller
  namespace: kube-system

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: helm-controller
  namespace: kube-system
  labels:
    app: helm-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: helm-controller
  template:
    metadata:
      labels:
        app: helm-controller
    spec:
      containers:
        - name: helm-controller
          image: rancher/helm-controller:v0.13.2
          command: ["helm-controller"]
      enableServiceLinks: false
      serviceAccountName: helm-controller

The reason why I deploy the helm controller to k3s is because since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore. I might open a seperate issue in k3s, but for now this should do the job.

since upgrading to k3s 1.26.3 (from 1.26.1) the embedded helm controller does not deploy any HelmChart ressources anymore.

That seems like a more important issue, can you open an issue in the k3s repo and provide additional information on what's going on with that?

Can you tell from the logs (perhaps --debug logs) why it's re-enqueueing things endlessly? The NotFound error suggests that the informer cache is empty, and has been reported in other issues but never completed root-caused.

I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.

I would also note that if you're going to deploy this to k3s, you should be sure to add the --disable-helm-controller option to the k3s servers.

This is done with the following line in k3s config.yaml file:

# Disabled components
disable:
    - servicelb
    - traefik
    - metrics-server
    - helm

Let me know if I should do it with the option

How can I enable the debug option in helm-controller.

The k3s issue will be in a 2nd step, for now it's just important that it works, because I need the helm deployments (critical for networking)

disable:
   - servicelb
   - traefik
   - metrics-server
   - helm

no, that's not right. It is --disable-helm-controller as I said - or disable-helm-controller: true if you're using a config file.

How can I enable the debug option in helm-controller.

adding the --debug flag to the command line args should do it?

Thanks you,

Using the disable-helm-controller: true in the k3s.yaml file fixed the issue. I guess the 2 controllers fought each other in state.
I also upgraded to 0.13.3 now and it seems to run stable.

I have the feeling that my etcd is somewhat borked (why the k3s internal controller doesn't work anymore), but since it's a production cluster, I can not easily revert to the embedded helm controller and I would rather have it externally and properly loggable.

Another question would be if I can somehow tolerate the node-not-ready condition, because I am deploying the CNI with Helm.
How is this done within k3s?

Thanks for the nice and fast help.

Another question would be if I can somehow tolerate the node-not-ready condition

For the controller itself, or for the job pods? If you're talking about the job pods, that's what the bootstrap field in the HelmChart spec is for.

https://docs.k3s.io/helm#helmchart-field-definitions

For the controller, because it will need to spawn before the CNI plugin can spawn

You would need to add a NotReady toleration to the example deployment, set it up to run with host network, and add node selectors that require it to run on control-plane node so that it can talk to the local apiserver.