Container restart loses connectivity to backends
Closed this issue · 5 comments
phoolish commented
When Kubernetes restarts the container due to a liveliness probe failure the container comes back with 0 backends.
varnishadm backend.list
Backend name Admin Probe Health Last change
boot.default healthy 0/0 healthy Sun, 08 May 2022 02:30:49 GMT
I confirmed that /etc/varnish/backends.vcl
is still populated correctly and other pods can still connect to the backends without a problem. Deleting the pod "fixes" it.
Here is our VarnishCluster manifest for context.
apiVersion: caching.ibm.com/v1alpha1
kind: VarnishCluster
metadata:
labels:
operator: varnish
name: pcms-api
spec:
backend:
port: 80
selector:
app: pcms
component: web
purpose: api
replicas: 3
service:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9131"
prometheus.io/scrape-only: "true"
port: 80
varnish:
args:
- -p
- http_max_hdr=256
- -p
- http_resp_hdr_len=256k
- -p
- http_resp_size=1024k
- -p
- workspace_backend=256k
- -s
- malloc,756M
resources:
limits:
cpu: 500m
memory: 1028Mi
requests:
cpu: 500m
memory: 1028Mi
vcl:
configMapName: pcms-varnishcluster
entrypointFileName: default.vcl
cin commented
Thanks for reporting the issue. We'll take a look. FYI: @tomashibm
cin commented
Was able to reproduce by /sbin/killall5
which killed all the processes and containers but still caused the condition mentioned above to occur. Deleting pod fixed it but going to try forcing an update.
cin commented
There's no vcl either...
varnish> vcl.list
200
active auto warm 0 boot
EDIT: It is possible to fix w/out deleting the pod by manually loading the vcl.
varnish> vcl.load test /etc/varnish/entrypoint.vcl [145/3353]
200
VCL compiled.
varnish> vcl.use test
200
VCL 'test' now active
varnish> vcl.list
200
available auto warm 0 boot
active auto warm 0 test
varnish> backend.list
200
Backend name Admin Probe Health Last change
test.nginx-7848d4b86f-fkw7w healthy 0/0 healthy Mon, 09 May 2022 17:05:22 GMT
test.nginx-7848d4b86f-npc52 healthy 0/0 healthy Mon, 09 May 2022 17:05:22 GMT
test.container_rr probe 2/2 healthy Mon, 09 May 2022 17:05:22 GMT