When deploying the helm chart with kustomize, sometimes the helm chart version cannot be updated
Closed this issue · 3 comments
What happened:
In the PipeCD Application that uses Helm charts, an issue occurred where the version of the Helm chart was not updated even if the version was updated and deployed via PipeCD.
What you expected to happen:
How to reproduce it:
Environment:
piped
version: v0.48.3control-plane
version: v0.48.4- k8s version: 1.28.11
- Others:
This incident was due to a part of PipeCD not considering the kustomize specifications.
It can mainly occur when deploying a helm chart using kustomize (< 5.3.0) & enable-helm.
Also, since it is a flaky phenomenon, it seems that it will not always be reproduced.
work around
- Use kustomize 5.3.0 or higher
Investigation
Overview
The premise is that during driftdetection, a Git repository is cloned and then pulled and reused. Additionally, a common manifest cache is used for driftdetection, plan preview, plan, and deploy operations. Therefore, if a wrong manifest is cached in a state, it may affect other processes.
When you run kustomize build --enable-helm
, a charts
directory is created directly under the executed directory, and the Helm chart is downloaded locally once. The build result is based on this chart, but once the charts
directory is created, it will not be updated even if you change the Helm chart version in kustomization.yaml
by using kustomize before v5.3.0.
Because driftdetection reuses repositories, it ends up loading manifests with no updated version. The loaded manifest is cached based on the commit hash, so if Plan
or Deploy
is executed referring to the same commit hash, the wrong manifest will be applied.
The situation to repro
We can reproduce it with the situation below.
- Create PR to update helm chart version & merge
- Drift detection runs and a manifest is generated by kustomize (< v5.3.0) (the contents are incorrect at this point)
- Cache the manifest during the drift detection
- Manifest cached during Plan is used
- Cached manifest is used when executing Deploy
Note: If executed in the order of Plan -> drift detection, no problem will occur.
Example
I reproduced it by using opentelenetry-operator.
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-operator
helm chart version before correction ↓
# kustomization.yaml
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% cat kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
labels:
- pairs:
app.kubernetes.io/owner: test
helmCharts:
- name: opentelemetry-operator
repo: https://open-telemetry.github.io/opentelemetry-helm-charts
version: v0.29.0
releaseName: opentelemetry-operator
namespace: opentelemetry-operator-system
includeCRDs: true
# manifest built for Deployment
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% kustomize build --enable-helm . | grep -A 10 "kind: Deployment"
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/owner: test
app.kubernetes.io/version: 0.76.1
helm.sh/chart: opentelemetry-operator-0.29.0
name: opentelemetry-operator
# charts dir
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% tree -L 2 .
.
├── app.pipecd.yaml
├── charts
│ ├── opentelemetry-operator
│ └── opentelemetry-operator-0.29.0.tgz
└── kustomization.yaml
4 directories, 2 files
After updating the version of the helm chart in kustomization.yaml to v0.64.4, build in the same dir↓
# kustomization.yaml
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% cat kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
labels:
- pairs:
app.kubernetes.io/owner: ffjlabo
helmCharts:
- name: opentelemetry-operator
repo: https://open-telemetry.github.io/opentelemetry-helm-charts
version: v0.64.4
releaseName: opentelemetry-operator
namespace: opentelemetry-operator-system
includeCRDs: true
valuesFile: ./values.yaml
# manifest built for Deployment
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% kustomize build --enable-helm . | grep -A 10 "kind: Deployment"
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/owner: ffjlabo
app.kubernetes.io/version: 0.76.1
helm.sh/chart: opentelemetry-operator-0.29.0
name: opentelemetry-operator
# charts dir
~/works/ffjlabo/ffjlabo-dev/kubernetes/opentelemetry-operator [fujiwo]
% tree -L 2 .
.
├── app.pipecd.yaml
├── charts
│ ├── opentelemetry-operator
│ └── opentelemetry-operator-0.29.0.tgz
├── kustomization.yaml
└── values.yaml
4 directories, 3 files
This is like the spec in kustomize < v5.3.0.
Currently, the specifications have changed, and it seems that even if an older version exists in the charts dir, it is now possible to build using the updated helm chart.
ref: kubernetes-sigs/kustomize#5293
I think there are some solutions, but mainly making a fix for the drift detection process.
- clone the repo from the git provider for every drift detections
- select kustomize version after v5.3.0 on app.pipecd.yaml
- remove
charts
dir after every drift detection
1 is not ideal because the drift detection is executed per 1min.
2 is also not ideal because this is just a workaround, the users who uses older kustomize version can't solve the problem
So I will try 3 solution.