argoproj/applicationset

RFC 1123 breach causes massive slowdown in new deployments and stops deletions

dyasny opened this issue · 0 comments

I have appset 0.3.0 which I use with helm and the git generator.

here's the appset:

apiVersion: v1
items:
- apiVersion: argoproj.io/v1alpha1
  kind: ApplicationSet
  metadata:
    annotations:
      argocd.argoproj.io/application-set-refresh: "true"
    name: overlord
    namespace: argocd
  spec:
    generators:
    - git:
        files:
        - path: appset/cluster-config/**/config.yaml
        repoURL: https://github.com/myorg/myrepo.git
        revision: HEAD
    template:
      metadata:
        annotations:
          notifications.argoproj.io/subscribe.on-created.slack: dev-platform-alerts
          notifications.argoproj.io/subscribe.on-deleted.slack: dev-platform-alerts
          notifications.argoproj.io/subscribe.on-deployed.slack: dev-platform-alerts
          notifications.argoproj.io/subscribe.on-health-degraded.slack: dev-platform-alerts
          notifications.argoproj.io/subscribe.on-sync-failed.slack: dev-platform-alerts
          notifications.argoproj.io/subscribe.on-sync-status-unknown.slack: dev-platform-alerts
        name: '{{project_name}}-{{org_id}}'
      spec:
        destination:
          namespace: '{{org_id}}'
          server: https://kubernetes.default.svc
        project: default
        source:
          chart: mychart
          helm:
            parameters:
            - name: org_id
              value: '{{org_id}}'
            - name: project_name
              value: '{{project_name}}'
            releaseName: mychart
          path: ""
          repoURL: https://raw.githubusercontent.com/myorg/myrepo/main/helm-repo/
          targetRevision: 0.1.0
        syncPolicy:
          automated:
            allowEmpty: false
            prune: true
            selfHeal: true
          retry:
            backoff:
              duration: 10s
              factor: 2
              maxDuration: 15m
            limit: 5
          syncOptions:
          - Validate=true
          - CreateNamespace=true
          - PrunePropagationPolicy=foreground
          - PruneLast=true

This triggers a helm deploy using org_id and project_name in various locations across the helm templates, including deployment names and ingress fqdns.

One of my users posted values that do not adhere to the RFC - thay had caps and underscores. The logs of the applicationset controller showed the following:

    "log": "time=\"2022-04-08T17:02:01Z\" level=error msg=\"failed to unchanged Application\" action=unchanged app=RecipesApp-o-flores appSet=overlord error=\"Application.argoproj.io \\\"RecipesApp-o-flores\\\" is invalid: metadata.name: Invalid value: \\\"RecipesApp-o-flores\\\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')\"\n",

And the app was not deployed (which is totally fine). However, for as long as the git repo monitored for these config files contained configs with non-compliant values, I had two ongoing issues:

  • My apps started taking much longer to deploy (2 minutes vs 20 sec)
  • Deleted config files stopped causing application deletion.

Expected behaviour:

  • if a faulty config exists: report in logs, would be nice to also report on it via argo-notifications, to raise awareness early.
  • do not slow down deployments
  • do not block app deletions