"kong stop" in wait-for-db command prevents init container from exiting gracefully, suggest kong quit"
mitrousis opened this issue · 2 comments
Using "kong stop" in this command exits the container, but not gracefully. While the follow-on proxy container is still healthy, the wait-for-db container continues to restart, creating a lot of concerning-looking logs, such as:
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to [::]:8002 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8003 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8446 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8004 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8447 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8001 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to [::]:8001 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8100 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to [::]:8100 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8005 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to [::]:8005 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8006 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [notice] 1173795#0: try again to bind() after 500ms
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8002 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to [::]:8002 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8003 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8446 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8004 failed (98: Address already in use)
Mon, Feb 5 2024 11:17:31 pm | 2024/02/06 07:17:22 [emerg] 1173795#0: bind() to 0.0.0.0:8447 failed (98: Address already in use)
Suggest a change similar to:
until kong start; do echo 'waiting for db'; sleep 5; done; kong quit --wait=5
Additionally, there should be a timeout to fail the init reasonably quickly since DB avail is unrecoverable.
Can you confirm your chart and Kong image versions? Do you have logs from the wait-for-db
container prior to it attempting the conflicting binds and Pod describe info? I'm not able to replicate this with a basic instance, so I suspect there's something more going on.
In any case, I'd recommend just setting waitImage.enabled=false
if it's causing trouble. That init container is of pretty limited use, since the main app container will hit the same DB unavailable failure state. wait-for-db
is mostly in place to avoid the Kong Pods hitting CrashLoopBackOff during initial install while waiting for initial migrations.
With the current 2.37.0 release and the example Postgres-bundled config:
helm install ana kong/kong -f https://raw.githubusercontent.com/Kong/charts/main/charts/kong/example-values/minimal-kong-standalone.yaml
kubectl describe pod PODNAME
gives me:
wait-for-db:
Container ID: containerd://9f0842f70707888d2daf34115ac549d01acf3023489c1350f05fdf03a0d19d7c
Image: kong:3.5
Image ID: docker.io/library/kong@sha256:ba218fe5832d5fd2c2d915df9016d59e17af1ef58dcd7edbf4744cc480c55793
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Args:
/bin/bash
-c
export KONG_NGINX_DAEMON=on KONG_PREFIX=`mktemp -d` KONG_KEYRING_ENABLED=off; until kong start; do echo 'waiting for db'; sleep 1; done; kong stop
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 21 Feb 2024 15:00:25 -0800
Finished: Wed, 21 Feb 2024 15:00:49 -0800
Ready: True
The described behavior is also pretty odd. AFAIK you should never reach a state where both the application and init containers are running simultaneously: failed init containers block the start of application containers, and successful init containers do not respawn, since their restart policy is OnFailure
when the Pod-level policy is Always
.
If it's actually failing (I edited the args to run false
to force it), you should get an Init:Error
eventually transitioning to Init:CrashLoopBackOff
, with the app containers never starting at all:
$ kubectl get po
ana-kong-7464647b7b-szldd 0/1 Init:Error 1 (6s ago) 7s
$ sleep 80
$ kubectl get po
ana-kong-7464647b7b-szldd 0/1 Init:CrashLoopBackOff 5 (102s ago) 4m39s
wait-for-db:
Container ID: containerd://db988d99cad4f82efec21334304fcb7bf936fc9017be7fe1bc4d80dfcd9c60fe
Image: kong:3.5
Image ID: docker.io/library/kong@sha256:ba218fe5832d5fd2c2d915df9016d59e17af1ef58dcd7edbf4744cc480c55793
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Args:
false
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 21 Feb 2024 15:46:59 -0800
Finished: Wed, 21 Feb 2024 15:46:59 -0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 21 Feb 2024 15:46:58 -0800
Finished: Wed, 21 Feb 2024 15:46:58 -0800
Ready: False
The basic rules around init container management stipulate that they should never be running concurrently with anything else, so if they are that'd arguably be a Kubernetes-level bug. I was unable to find any reports elsewhere matching that description, however.
It's worth noting that the exit code is technically from the bash -c
(and then https://github.com/Kong/docker-kong/blob/master/docker-entrypoint.sh below that), so this is maybe something else in the scripts returning the exit code.
We're using Kong helm charts 2.19.0, and image kong-gateway:3.4.1.1-rhel
. However, I no longer see the issue (now seeing success using the original chart version) and can't replicate to get the original error logs. I only have this screenshot from Rancher:
Given there are a variety of teams working on our infra, I won't be able to narrow this down to any specific change/fix. Feel free to close this as unreproducible from here. Thanks for having a look.