RFC 1123 breach causes massive slowdown in new deployments and stops deletions
dyasny opened this issue · 0 comments
dyasny commented
I have appset 0.3.0 which I use with helm and the git generator.
here's the appset:
apiVersion: v1
items:
- apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
annotations:
argocd.argoproj.io/application-set-refresh: "true"
name: overlord
namespace: argocd
spec:
generators:
- git:
files:
- path: appset/cluster-config/**/config.yaml
repoURL: https://github.com/myorg/myrepo.git
revision: HEAD
template:
metadata:
annotations:
notifications.argoproj.io/subscribe.on-created.slack: dev-platform-alerts
notifications.argoproj.io/subscribe.on-deleted.slack: dev-platform-alerts
notifications.argoproj.io/subscribe.on-deployed.slack: dev-platform-alerts
notifications.argoproj.io/subscribe.on-health-degraded.slack: dev-platform-alerts
notifications.argoproj.io/subscribe.on-sync-failed.slack: dev-platform-alerts
notifications.argoproj.io/subscribe.on-sync-status-unknown.slack: dev-platform-alerts
name: '{{project_name}}-{{org_id}}'
spec:
destination:
namespace: '{{org_id}}'
server: https://kubernetes.default.svc
project: default
source:
chart: mychart
helm:
parameters:
- name: org_id
value: '{{org_id}}'
- name: project_name
value: '{{project_name}}'
releaseName: mychart
path: ""
repoURL: https://raw.githubusercontent.com/myorg/myrepo/main/helm-repo/
targetRevision: 0.1.0
syncPolicy:
automated:
allowEmpty: false
prune: true
selfHeal: true
retry:
backoff:
duration: 10s
factor: 2
maxDuration: 15m
limit: 5
syncOptions:
- Validate=true
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
This triggers a helm deploy using org_id
and project_name
in various locations across the helm templates, including deployment names and ingress fqdns.
One of my users posted values that do not adhere to the RFC - thay had caps and underscores. The logs of the applicationset controller showed the following:
"log": "time=\"2022-04-08T17:02:01Z\" level=error msg=\"failed to unchanged Application\" action=unchanged app=RecipesApp-o-flores appSet=overlord error=\"Application.argoproj.io \\\"RecipesApp-o-flores\\\" is invalid: metadata.name: Invalid value: \\\"RecipesApp-o-flores\\\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')\"\n",
And the app was not deployed (which is totally fine). However, for as long as the git repo monitored for these config files contained configs with non-compliant values, I had two ongoing issues:
- My apps started taking much longer to deploy (2 minutes vs 20 sec)
- Deleted config files stopped causing application deletion.
Expected behaviour:
- if a faulty config exists: report in logs, would be nice to also report on it via argo-notifications, to raise awareness early.
- do not slow down deployments
- do not block app deletions