Seeing `Not Implemented` error with Canary and MetricTemplate
joedborg opened this issue · 1 comments
joedborg commented
Describe the bug
I'm getting this error as a new image is being rolled out:
{"level":"error","ts":"2024-06-27T15:12:22.835Z","caller":"controller/events.go:39","msg":"Metric query failed for consumer-lag: error response: {\"code\":5,\"message\":\"Not Implemented (category=INVALID_REQUEST_ERROR code=NOT_FOUND)\",\"details\":[{\"type_url\":\"type.googleapis.com/apierrors.Error\",\"value\":\"CAIQoNQYGg9Ob3QgSW1wbGVtZW50ZWQ=\"}]}","canary":"my-canary.my-ns","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runMetricChecks\n\t/workspace/pkg/controller/scheduler_metrics.go:285\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis\n\t/workspace/pkg/controller/scheduler.go:753\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:442\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}
With
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-canary
spec:
provider: kubernetes
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
progressDeadlineSeconds: 60
service:
port: 8080
analysis:
interval: 30s
iterations: 10
threshold: 2
metrics:
- name: consumer-lag
templateRef:
name: my-deployment-lag
thresholdRange:
max: 1500
interval: 30m
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: my-deployment-lag
spec:
provider:
type: prometheus
address: https://myorg.chronosphere.io:443
secretRef:
name: chronosphere
query: |
sum by (
kafka_id, topic, consumer_group_id
) (
confluent_kafka_server_consumer_lag_offsets{
job="my-job",
cluster="my-cluster",
consumer_group_id="my-consumer-group"
}
)
Which results in
NAME STATUS WEIGHT LASTTRANSITIONTIME
my-canary Failed 0 2024-06-27T15:13:22Z
My first guess would be that Chronosphere's API isn't exactly the same as Prometheus', but I'm not sure.
To Reproduce
Use manifests above and attempt a rollout.
Expected behavior
I expect to not get this error and canary promotion to succeed.
Additional context
- Flagger version: 1.37.0
- Kubernetes version: 1.27.13
- Service Mesh provider: Istio v1.17.1
- Ingress provider: Istio v1.17.1