fluxcd/flagger

Errrs when using custom metrics without MetricTemplate

timsanwald opened this issue · 3 comments

Describe the bug

Custom metrics directly in Canary without the use of MetricTemplate is not working anymore.
If theres not a single use of the predefined metrics ("request-duration" and "request-success-rate")

We are getting the error introduced with the following commit 16f8e15

Why is it required to use the MetricTemplate function now, instead of directly addressing the e.g. query? I couldn't find any changelog noting the breaking change there at least.

spec:
analysis:
interval: 60s
iterations: 5
metrics:
- name: canary-health-metrics
query: >-
sum(logback_events_total{level="error", job="some-app"}) <=
bool 0
thresholdRange:
min: 1

{"level":"error","ts":"2024-03-06T14:36:57.883Z","caller":"controller/events.go:39","msg":"Metric query failed for no usable metrics template were configured","canary":"redacted.namespace","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runMetricChecks\n\t/workspace/pkg/controller/scheduler_metrics.go:310\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis\n\t/workspace/pkg/controller/scheduler.go:744\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:433\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}

To Reproduce

Didn't had time until now to reproduce, just my observation now.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

  • Flagger version: 1.36.0
  • Kubernetes version: 1.27.9
  • Service Mesh provider: istio
  • Ingress provider: istio

We shouldn't error out if the query field is present.

else if metric.Name != "request-success-rate" && metric.Name != "request-duration" {
	c.recordEventErrorf(canary, "Metric query failed for no usable metrics template were configured")
	return false
}

} else if metric.Name != "request-success-rate" && metric.Name != "request-duration" {

It's because I only did TemplateRef and BuiltinMetric checks and didn't consider Metric.query. Because the description of query in the canary API is ready to be deprecated.

Do I need to add the relevant checks now? Or is it better to encourage users to use TemplateRef and abandon query?
@stefanprodan

@LiZhenCheng9527 we can't drop deprecated fields unless we bump the API version. To fix the breaking change now, we should look for query and if it exists, we shouldn't error out.