Issue with Canary Deployment: Metric Not Reporting

Question

Issue with Canary Deployment: Metric Not Reporting

Closed this issue 2 months ago · 4 comments

I'm implementing a canary deployment using Flagger to monitor my application. The goal is to monitor the success rate of HTTP requests to a health endpoint (/ping). However, despite configuring the request-success-rate metric, Flagger isn't sending any metrics or requests to the endpoint. I am using traefik provider.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: test-service
  namespace: test
spec:
  provider: traefik
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-service
  progressDeadlineSeconds: 300
  service:
    port: 3000
    targetPort: 3000
  analysis:
    interval: 10s
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
      - name: request-success-rate
        interval: 30s
        thresholdRange:
          min: 99
        failureThreshold: 5
        query: "http://test-service:3000/ping"
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 10s
        metadata:
          type: bash
          cmd: "curl -X GET http://test-service:3000/ping"
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 10s -q 10 -c 2 http://test-service:3000/ping"
          logCmdOutput: "true"
{{- end }}

I tested the curl and hey commands from inside the load tester pod and they work fine. But when I check my canary, it goes in failed status after initialized

Events:
Type Reason Age From Message

Warning Synced 4m19s flagger test-service-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation
Warning Synced 3m29s (x5 over 4m9s) flagger test-service-primary.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 3m19s (x7 over 4m19s) flagger all the metrics providers are available!
Normal Synced 3m19s flagger Initialization done! test-service.test
Normal Synced 2m49s flagger New revision detected! Scaling up test-service.test
Warning Synced 119s (x5 over 2m39s) flagger canary deployment test-service.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 109s flagger Starting canary analysis for test-service.test
Normal Synced 109s flagger Pre-rollout check acceptance-test passed
Normal Synced 109s flagger Advance test-service.test canary weight 5
Warning Synced 89s (x2 over 99s) flagger Halt advancement no values found for traefik metric request-success-rate probably test-service.test is not receiving traffic: running query failed: no values found

I am not sure if I am missing something.

Answer 1 · 2024-10-26T14:14:00.000Z

could you test if the required metrics are showing in your prometheus server?

Answer 2 · 2024-10-27T15:46:50.000Z

could you test if the required metrics are showing in your prometheus server?

@aryan9600 I do not have a prometheus server. I am using metrics-server. I was reading more on canary and I think prometheus is a requirement for this setup. But I am running the podinfo canary there(https://github.com/stefanprodan/podinfo) and it works fine even without prometheus. I am not sure why that is working and not my custom service.

Answer 3 · 2024-10-28T08:09:07.000Z

The goal is to monitor the success rate of HTTP requests to a health endpoint (/ping).

The query field is for specifying a PromQL query, see the docs here: https://docs.flagger.app/usage/metrics#prometheus

If you don't use Prometheus, then delete the metrics field, the webhooks are enough to test the ping endpoint.

Answer 4 · 2024-10-28T13:17:40.000Z

The goal is to monitor the success rate of HTTP requests to a health endpoint (/ping).

The query field is for specifying a PromQL query, see the docs here: https://docs.flagger.app/usage/metrics#prometheus

If you don't use Prometheus, then delete the metrics field, the webhooks are enough to test the ping endpoint.

@stefanprodan the podinfo canary that you created, that is working fine with my setup(without prometheus). I am just wondering how is that working with the metrics field? And just to confirm, you are saying that I should remove the entire block below?

metrics:
      - name: request-success-rate
        interval: 30s
        thresholdRange:
          min: 99
        failureThreshold: 5
        query: "http://test-service:3000/ping"