Errrs when using custom metrics without MetricTemplate
timsanwald opened this issue · 3 comments
Describe the bug
Custom metrics directly in Canary without the use of MetricTemplate is not working anymore.
If theres not a single use of the predefined metrics ("request-duration" and "request-success-rate")
We are getting the error introduced with the following commit 16f8e15
Why is it required to use the MetricTemplate function now, instead of directly addressing the e.g. query? I couldn't find any changelog noting the breaking change there at least.
spec:
analysis:
interval: 60s
iterations: 5
metrics:
- name: canary-health-metrics
query: >-
sum(logback_events_total{level="error", job="some-app"}) <=
bool 0
thresholdRange:
min: 1
{"level":"error","ts":"2024-03-06T14:36:57.883Z","caller":"controller/events.go:39","msg":"Metric query failed for no usable metrics template were configured","canary":"redacted.namespace","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runMetricChecks\n\t/workspace/pkg/controller/scheduler_metrics.go:310\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis\n\t/workspace/pkg/controller/scheduler.go:744\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:433\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}
To Reproduce
Didn't had time until now to reproduce, just my observation now.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
- Flagger version: 1.36.0
- Kubernetes version: 1.27.9
- Service Mesh provider: istio
- Ingress provider: istio
We shouldn't error out if the query
field is present.
else if metric.Name != "request-success-rate" && metric.Name != "request-duration" {
c.recordEventErrorf(canary, "Metric query failed for no usable metrics template were configured")
return false
}
flagger/pkg/controller/scheduler_metrics.go
Line 309 in 9a224a0
It's because I only did TemplateRef and BuiltinMetric checks and didn't consider Metric.query
. Because the description of query in the canary API is ready to be deprecated
.
Do I need to add the relevant checks now? Or is it better to encourage users to use TemplateRef and abandon query?
@stefanprodan
@LiZhenCheng9527 we can't drop deprecated fields unless we bump the API version. To fix the breaking change now, we should look for query
and if it exists, we shouldn't error out.