fluxcd/flagger

Canary deployment

Closed this issue · 2 comments

Canary deployment

I am having the same issues with istio.

I see that flagger is hiting prometheus. I see the query but for some uknown reason to me its just not getting any traffic to new pod. Canary deployment has 0 or 1 value when I query this metric. Traffic to old pod works and its showing on in prometheus.

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: request-duration
  namespace: flagger
spec:
  provider:
    type: prometheus
    address: http://mimir-distributed-gateway.observability:8080/prometheus
  query: |
    histogram_quantile(0.99,
      sum(
        irate(
          istio_request_duration_milliseconds_bucket{
            reporter="destination",
            destination_workload=~"{{ target }}",
            destination_workload_namespace=~"{{ namespace }}"
          }[{{ interval }}]
        )
      ) by (le)
    )

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: request-success-rate
  namespace: flagger
spec:
  provider:
    type: prometheus
    address: http://mimir-distributed-gateway.observability:8080/prometheus
  query: |
    sum(
        rate(
            istio_requests_total{
              reporter="destination",
              destination_workload_namespace=~"{{ namespace }}",
              destination_workload=~"{{ target }}",
              response_code!~"5.*"
            }[{{ interval }}]
        )
    )
    /
    sum(
        rate(
            istio_requests_total{
              reporter="destination",
              destination_workload_namespace=~"{{ namespace }}",
              destination_workload=~"{{ target }}"
            }[{{ interval }}]
        )
    )

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: echo-server-cannary
  namespace: debug
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: echo-server
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 600
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: echo-server
  service:
    # service port number
    port: 80
    # container port number or name (optional)
    targetPort: 80
    # Istio gateways (optional)
    gateways:
    - default/gw-dev-imba-com
    # Istio virtual service host names (optional)
    hosts:
    -imba.com
    match:
      - uri:
          prefix: /api/echo
    # Istio traffic policy (optional)
    trafficPolicy:
      tls:
        # use ISTIO_MUTUAL when mTLS is enabled
        mode: ISTIO_MUTUAL
    # Istio retry policy (optional)
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: "gateway-error,connect-failure,refused-stream"
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 10
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
      - name: request-success-rate
        templateRef:
          name: request-success-rate
          namespace: flagger
        thresholdRange:
          max: 500
        interval: 5m
      - name: request-duration
        templateRef:
          name: request-duration
          namespace: flagger
        thresholdRange:
          max: 500
        interval: 5m
    # testing (optional)
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: https://imba.com/api/echo
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -sd 'test' https://imba.com/api/echo | grep token"
      - name: load-test
        url: https://imba.com/api/echo
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://imba.com/api/echo"

I found that the problem is with metrics. Its not generating enough traffic to show any value for given metric, thus resulting in failed rollout.

➜  ~ istioctl version
client version: 1.24.0
control plane version: 1.21.0
data plane version: 1.21.0 (61 proxies)
➜  ~