Container restart loses connectivity to backends

Question

Container restart loses connectivity to backends

Closed this issue 2 years ago · 5 comments

When Kubernetes restarts the container due to a liveliness probe failure the container comes back with 0 backends.

varnishadm backend.list
Backend name   Admin      Probe    Health     Last change
boot.default   healthy    0/0      healthy    Sun, 08 May 2022 02:30:49 GMT

I confirmed that /etc/varnish/backends.vcl is still populated correctly and other pods can still connect to the backends without a problem. Deleting the pod "fixes" it.

Here is our VarnishCluster manifest for context.

apiVersion: caching.ibm.com/v1alpha1
kind: VarnishCluster
metadata:
  labels:
    operator: varnish
  name: pcms-api
spec:
  backend:
    port: 80
    selector:
      app: pcms
      component: web
      purpose: api
  replicas: 3
  service:
    annotations:
      prometheus.io/path: /metrics
      prometheus.io/port: "9131"
      prometheus.io/scrape-only: "true"
    port: 80
  varnish:
    args:
    - -p
    - http_max_hdr=256
    - -p
    - http_resp_hdr_len=256k
    - -p
    - http_resp_size=1024k
    - -p
    - workspace_backend=256k
    - -s
    - malloc,756M
    resources:
      limits:
        cpu: 500m
        memory: 1028Mi
      requests:
        cpu: 500m
        memory: 1028Mi
  vcl:
    configMapName: pcms-varnishcluster
    entrypointFileName: default.vcl

Answer 1 · 2022-05-09T13:49:24.000Z

Thanks for reporting the issue. We'll take a look. FYI: @tomashibm

Answer 2 · 2022-05-09T16:57:23.000Z

Was able to reproduce by /sbin/killall5 which killed all the processes and containers but still caused the condition mentioned above to occur. Deleting pod fixed it but going to try forcing an update.

Answer 3 · 2022-05-09T17:02:34.000Z

There's no vcl either...

varnish> vcl.list
200
active   auto    warm         0    boot

EDIT: It is possible to fix w/out deleting the pod by manually loading the vcl.

varnish> vcl.load test /etc/varnish/entrypoint.vcl                                                                                                                                                                                                                    [145/3353]
200
VCL compiled.

varnish> vcl.use test
200
VCL 'test' now active
varnish> vcl.list
200
available   auto    warm         0    boot
active      auto    warm         0    test

varnish> backend.list
200
Backend name                Admin    Probe  Health   Last change
test.nginx-7848d4b86f-fkw7w healthy  0/0    healthy  Mon, 09 May 2022 17:05:22 GMT
test.nginx-7848d4b86f-npc52 healthy  0/0    healthy  Mon, 09 May 2022 17:05:22 GMT
test.container_rr           probe    2/2    healthy  Mon, 09 May 2022 17:05:22 GMT

Answer 4 · 2022-05-09T21:52:27.000Z

@cin Thank you for the the followup. If we need manual intervention it is easier for my team to just delete the pod.

I wonder if it is restarting and just using the system default vcl /etc/default/varnish.

Answer 5 · 2022-05-23T16:25:13.000Z

fix is released in the latest 0.31.0 version