[kube-prometheus-stack] Lost Prometheus targets after upgrade to version 63
kallaics opened this issue ยท 23 comments
Describe the bug a clear and concise description of what the bug is.
We have PodMonitor, ServiceMonitor and ScrapeConfig resources and after the upgrade these resources are completely disappeared from the targets list in Prometheus and not collect the metrics. I spent a morning to find any relevant change in the release notes, but I don't found anything. The only solution was the downgrade.
All affected resources are included through Selector.matchLabels
Worked well on version: 62.7.0
Not worked on version: 63.0.0 and 63.0.1
Prometheus configuration
prometheus:
enabled: true
ingress:
enabled: true
ingressClassName: nginx
annotations:
nginx.ingress.kubernetes.io/whitelist-source-range: <ip ranges>
labels: {}
hosts:
- prometheus.<domain>.<tld>
paths:
- /
pathType: ImplementationSpecific
tls: []
prometheusSpec:
prometheusExternalLabelNameClear: true
replicaExternalLabelNameClear: true
podMonitorSelector:
matchLabels:
app.kubernetes.io/part-of: flux
serviceMonitorSelector:
matchLabels:
monitoring: prometheus
scrapeConfigSelector:
matchLabels:
monitoring: prometheus
secrets:
- ca-certs
additionalArgs:
- name: "web.enable-admin-api"
resources:
requests:
memory: 6Gi
Scrape configuration (I know it is alpha status, so we are testing on non-production environment only)
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: <Kubernetes cluster name>
namespace: monitoring
labels:
monitoring: prometheus-scrape
spec:
scrapeInterval: 15s
scrapeTimeout: 12s
honorLabels: true
metricsPath: '/federate'
scheme: HTTPS
tlsConfig:
ca:
secret:
key: internal-ca.pem
name: ca-certs
basicAuth:
username:
key: username
name: monitoring-federate-auth
password:
key: password
name: monitoring-federate-auth
params:
'match[]':
- '{__name__=~".+"}'
staticConfigs:
- labels:
job: <Kubernetes cluster name>
env: dev
cluster_name: <Kubernetes cluster name>
targets:
- 'prometheus.<domain2>.<tld>'
Thanks guys the help.
What's your helm version?
version.BuildInfo{Version:"v3.16.1", GitCommit:"5a5449dc42be07001fd5771d56429132984ab3ab", GitTreeState:"dirty", GoVersion:"go1.23.1"}
What's your kubectl version?
Client Version: v1.31.1 Kustomize Version: v5.4.2 Server Version: v1.29.7
Which chart?
kube-prometheus-stack
What's the chart version?
63.0.0, 63.1.0
What happened?
After Helm chart upgrade the most of the target disappeared from Prometheus.
Affected resources:
- PodMonitor
- ServiceMonitor
- ScrapeConfig
What you expected to happen?
Not disappear the targets from Prometheus.
How to reproduce it?
Install Prometheus from kube-prometheus-stack version 62.7.0
Install PodMonitor resource in different namespace from the Prometheus
Install ServiceMonitor, ScrapeConfig resources in same namespace with Prometheus
After all target works well, upgrade the cluster to 63.0.0 or 63.1.0
Enter the changed values of values.yaml?
None
Enter the command that you execute and failing/misfunctioning.
helm install prometheus prometheus-community/kube-prometheus-stack --values values.yaml
Anything else we need to know?
The Deployment happened via FluxCD v 2.3.0 (latest stable).
Full Helm chart reinstall from scratch not solved the issue.
All CRDs are on latest version for the Helm chart.
Same here , i had to revert !
@kallaics did you check the matching label? The default values now set release: release_name as default label and I had to insert this label on all service monitors
You are right, I forgot to add this step to the reproducing steps. The custom matching label is a difference from the basic settings.
I double checked the matching labels. My custom matching labels were added to the resources.
Helm chart configuration not changed during the upgrade on my side, and after downgrade everything was okay again.
Instead:
scrapeConfigSelector:
matchLabels:
monitoring: prometheus
use
scrapeConfigSelector:
matchLabels:
release: null
monitoring: prometheus
Ref:
I would like to remove the remove the default matchLabels
.
The following works as expected:
helm template foo kube-prometheus-stack \
--repo https://prometheus-community.github.io/helm-charts \
>z0
cat >values.yaml <<EOF
prometheus:
prometheusSpec:
ruleSelector:
matchLabels: null
serviceMonitorSelector:
matchLabels: null
EOF
helm template foo kube-prometheus-stack \
--repo https://prometheus-community.github.io/helm-charts \
--values values.yaml \
>z1
$ diff z0 z1
--- z0
+++ z1
@@ -2273,8 +2273,7 @@
routePrefix: "/"
serviceAccountName: foo-kube-prometheus-stack-prometheus
serviceMonitorSelector:
- matchLabels:
- release: 'foo'
+ {}
serviceMonitorNamespaceSelector:
{}
podMonitorSelector:
@@ -2297,8 +2296,7 @@
ruleNamespaceSelector:
{}
ruleSelector:
- matchLabels:
- release: 'foo'
+ {}
scrapeConfigSelector:
matchLabels:
release: 'foo'
But if I use it as a subchart:
cat >Chart.yaml <<EOF
---
apiVersion: v2
name: system
type: application
version: 0.0.0
appVersion: 0.0.0
dependencies:
- name: kube-prometheus-stack
version: 63.1.0
repository: https://prometheus-community.github.io/helm-charts
EOF
cat >values.yaml <<EOF
kube-prometheus-stack:
prometheus:
prometheusSpec:
ruleSelector:
matchLabels: null
serviceMonitorSelector:
matchLabels: null
EOF
helm template foo . \
--values values.yaml \
>z1
$ diff z0 z1 | grep matchLabels
<empty>
in other words the matchLabels: null
is skipped.
Instead:
cat >values.yaml <<EOF
kube-prometheus-stack:
prometheus:
prometheusSpec:
ruleSelector:
matchLabels: a
serviceMonitorSelector:
matchLabels: a
EOF
helm template foo . \
--values values.yaml \
>z1
$ diff z0 z1 | grep matchLabels -A2
- matchLabels:
- release: 'foo'
+ matchLabels: a
serviceMonitorNamespaceSelector:
{}
--
- matchLabels:
- release: 'foo'
+ matchLabels: a
scrapeConfigSelector:
matchLabels:
release: 'foo'
Maybe due to helm/helm#12637?
@yellowhat Correct, We are aware that there is an helm issue which add an incompatibility with sub-charts. ref: #4818
Could you try this:
kube-prometheus-stack:
prometheus:
prometheusSpec:
additionalConfig:
scrapeConfigSelector:
matchLabels:
monitoring: prometheus
helm template may looks goofy, but due how duplicate keys on YAML works, the config from additionalConfig
should be prefer.
Mention that this is maybe a short-term workaround. We are looking forward into a solution.
Hi guys,
I never thought, during the weekend will somebody find the root cause and can provide a workaround as well. Big high five for every participants!
I agree with @yellowhat recommendation to apply the changes. I have no clue, how much user use the release
label, but the idea was basically good and useful for Helm chart validation.
@jkroepke It is good catch, but it will cause probably "major" change. I think this also would be great to move these items to there. We will see it can be there in the next version or just the next major release.
@yellowhat Correct, We are aware that there is an helm issue which add an incompatibility with sub-charts. ref: #4818
Could you try this:
kube-prometheus-stack: prometheus: prometheusSpec: additionalConfig: scrapeConfigSelector: matchLabels: monitoring: prometheus
helm template may looks goofy, but due how duplicate keys on YAML works, the config from
additionalConfig
should be prefer.Mention that this is maybe a short-term workaround. We are looking forward into a solution.
From what I can see, the default value is already empty:
kube-prometheus-stack:
prometheus:
prometheusSpec:
additionalConfig: {}
How can I remove the matchLabels
options?
If scrapeConfigSelector
is configured via additionalConfig
, it should fully override the existing scrapeConfigSelector
@jkroepke the helm issue with null values to subcharts that would hit in this case is afaik already fixed, and I was able to upgrade to this new version without any changes to my deployment:
https://github.com/broersma-forslund/homelab/pull/570/files
See diff in this comment: broersma-forslund/homelab#570 (comment)
I suspect users with issues are using older versions of helm.
May I ask which version are you using?
I am on 3.16.1.
May I ask which version are you using?
I am on 3.16.1.
Mhm weird, I'm on v3.15.2. But it's clearly working for my scenario.
It works only, if it's placed in the default values.yaml of the umbrella chart. I see in your diff, thats the case.
If the override comes from additional values, like via -f
, the issue still occurs.
As far as I know helm does not support overriding a non-null value with a null value in a subchart, e.g.
name: monitoring
version: "0.0.1"
appVersion: "0.1.0"
description: Monitoring platform
type: application
apiVersion: v2
dependencies:
- name: kube-prometheus-stack
version: 63.1.0
repository: https://prometheus-community.github.io/helm-charts
kube-prometheus-stack:
## Manages Prometheus and Alertmanager components
##
prometheusOperator:
prometheus:
prometheusSpec:
ruleSelector:
matchLabels: null # These nulls can never override a non-null value in values.yaml in https://prometheus-community.github.io/helm-charts
serviceMonitorSelector:
matchLabels: null
podMonitorSelector:
matchLabels: null
probeSelector:
matchLabels: null
scrapeConfigSelector:
matchLabels: null
so with this setup it is impossible to override the value in e.g. https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L3708-L3710
See various helm issues helm/helm#12741 helm/helm#13222 helm/helm#12511
As far as I know helm does not support overriding a non-null value with a null value in a subchart, e.g.
As explained above this does work as overrides in the parent chart default values but apparently not in values overrides provided as separate values to the parent chart.
Hi all,
this morning, I discussed with @QuentinBisson possible option to recover from the failed v63 release. Both proposals will result into an other breaking change. But it is what it is.
Proposal 1 - Switch to plain strings (Vote with ๐ )
All Selectors become plain strings.
Instead:
kube-prometheus-stack:
prometheus:
prometheusSpec:
scrapeConfigSelector:
matchLabels:
monitoring: prometheus
This format will be used:
kube-prometheus-stack:
prometheus:
prometheusSpec:
scrapeConfigSelector: |-
matchLabels:
monitoring: prometheus
This would avoid the required to set values to zero while giving the full flexibility to fill values. An empty string mean null in that case. If set, the default value "release: {{ $.Release.Name }}" would be overwritten without setting anything else to null.
The plain string would be passed through tpl as well.
On template site, it will be changes from
{{- if not (kindIs "invalid" .Values.prometheus.prometheusSpec.ruleNamespaceSelector) }}
ruleNamespaceSelector: {{ tpl (toYaml .Values.prometheus.prometheusSpec.ruleNamespaceSelector) . | nindent 4 }}
{{- end }}
to
{{- with .Values.prometheus.prometheusSpec.ruleNamespaceSelector }}
ruleNamespaceSelector: {{ tpl . $ | nindent 4 }}
{{- end }}
Proposal 2 - Revert to old values (Vote with ๐ )
Basically revert #4818 without any future improvements.
How it looks like to introduce a variable to apply the default label (release: {{ .Release.Name }} ) ?
It is my proposal. It is not breaking the simple yaml structure and probably better to read it.
In my case will ignore therelease
label:
Option A:
Added new variable to ruleconfigSelector level.
kube-prometheus-stack:
prometheus:
prometheusSpec:
ruleConfigSelector:
includeReleaseLabel: false # Default: true and will apply the 'release' label, when matchLabels is null, will ignored
matchLabels: # Allow to configure more additional label too with or without release label.
monitoring: prometheus
The template:
This logic can be implemented in _helpers.tpl
and just include it.
{{- if not (kindIs "invalid" .Values.prometheus.prometheusSpec.ruleNamespaceSelector) }}
ruleNamespaceSelector:
{{- if .Values.prometheus.prometheusSpec.ruleNamespaceSelector.includeReleaseLabel) }}
release: {{ .Release.Name }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.ruleNamespaceSelector.matchLabel) }}
{{ toYaml .Values.prometheus.prometheusSpec.ruleNamespaceSelector.matchLabel . | nindent 4 }}
{{- end }}
{{- end }}
Option B:
the ruleConfigSelector part will untouched and new variable added to the PrometheusSpec level.
kube-prometheus-stack:
prometheus:
prometheusSpec:
ruleConfigWithReleaseLabel: false # Default: true and will apply the 'release' label, when matchLabels is null, will ignored
ruleConfigSelector:
matchLabels: # Allow to configure more additional label too with or without release label.
monitoring: prometheus
The template:
This logic can be implemented in _helpers.tpl and just include it.
{{- if not (kindIs "invalid" .Values.prometheus.prometheusSpec.ruleNamespaceSelector) }}
ruleNamespaceSelector:
{{- if .Values.prometheus.prometheusSpec.ruleConfigWithReleaseLabel) }}
release: {{ .Release.Name }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.ruleNamespaceSelector.matchLabel) }}
{{ toYaml .Values.prometheus.prometheusSpec.ruleNamespaceSelector.matchLabel . | nindent 4 }}
{{- end }}
{{- end }}
As far as I know helm does not support overriding a non-null value with a null value in a subchart, e.g.
name: monitoring version: "0.0.1" appVersion: "0.1.0" description: Monitoring platform type: application apiVersion: v2 dependencies: - name: kube-prometheus-stack version: 63.1.0 repository: https://prometheus-community.github.io/helm-charts
kube-prometheus-stack: ## Manages Prometheus and Alertmanager components ## prometheusOperator: prometheus: prometheusSpec: ruleSelector: matchLabels: null # These nulls can never override a non-null value in values.yaml in https://prometheus-community.github.io/helm-charts serviceMonitorSelector: matchLabels: null podMonitorSelector: matchLabels: null probeSelector: matchLabels: null scrapeConfigSelector: matchLabels: null
so with this setup it is impossible to override the value in e.g. https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L3708-L3710
See various helm issues helm/helm#12741 helm/helm#13222 helm/helm#12511
Same problem here. So basically this comment here does not work for us:
Instead of having to set it to empty string or null I would much prefer something more functional like for example being able to set it to ALL
to match all labels or having a boolean property matchAllLabels
or something similar. scrapeConfigSelectorNilUsesHelmValues
was always a confusing property and setting a selector to ""
does not make it clear that this means match ALL instead of match NONE.
It doesn't help that the prometheus operator crd itself is also unusual in that it supports three states (nil, empty and filled). I don't think the current kps chart supports all three states, does it?
Same, using special label to discover the targets, had to revert back to 62 cuz everything dissapeared ๐
@rouke-broersma What do you mean about my proposal above? It can keep both option. Only issue you cannot use in "OR" relation the two settings (to keep the default release labels and I want to scrape my custom resources with different label(s) ).
@rouke-broersma What do you mean about my proposal above? It can keep both option. Only issue you cannot use in "OR" relation the two settings (to keep the default release labels and I want to scrape my custom resources with different label(s) ).
Your proposal does not functionally exactly match the actual scenarios. The actual scenario is not whether or not the release label is included. The actual scenario is one of these three:
- Match NO Resources (nil value in prometheus-operator)
- Match subset of resources based on labels (for example default release label, or custom label)
- Match ALL Resources (empty array/object in prometheus-operator)