502 when under load

Question

502 when under load

Closed this issue 4 years ago · 5 comments

When under load, nginx sporadically returns 502 responses

Repro

(Linux, docker driver, minikube w/ ingress addon)

substra login

for i in `seq 200`; do 
    substra get traintuple $i & 
done

In the logs

2020/07/13 17:44:56 [error] 2217#2217: *672834 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.17.0.1, server: substra-backend.node-1.com, request: "GET /traintuple/184/ HTTP/1.1", upstream: "http://172.18.0.47:8000/traintuple/184/", host: "substra-backend.node-1.com"
[...]
172.17.0.1 - - [13/Jul/2020:17:44:56 +0000] "GET /traintuple/184/ HTTP/1.1" 400 26 "-" "python-requests/2.24.0" 260 0.027 [org-1-backend-org-1-substra-backend-server-http] [] 172.18.0.47:8000, 172.18.0.47:8000 0, 26 0.000, 0.024 502, 400 be8dae30c7f82749a8b130ddf459875f

For 200 consecutive requests, I consistently get 1-4 "502" responses.

Interestingly, the first time I run the test, I only get one 502. When I re-run the test, I get 3-4 502s. I then keep on getting 3-4 502s in subsequent tests. This might be related to the fact that we currently use the cheaper algorithm.

Answer 1 · 2020-07-13T19:09:00.000Z

Nginx retry

Note that in the example above, nginx tries hitting the backend twice. See the end of the second log line:

502, 400 be8dae30c7f82749a8b130ddf459875f

Explanation:

nginx tries to hit the backend but the connection gets interrupted => 502
nginx tries to hit the backend again, this time the call succeed => 400 (this is the correct, expected return code for this test)

I have no explanation as to why the request is retried. My understanding is that this shouldn't happen since there's only one backend server configured (confirmed with kubectl ingress-nginx backends -n kube-system)

Answer 2 · 2020-07-17T15:58:33.000Z

I sometimes get "Bad Gateway" (502) errors when running the tests in substra-tests locally, it does not always happen to the same one I think.

My installation of Substra is with skaffold (see the installation instructions)

My setup is:

macOS Catalina 10.15.5
docker desktop community, 2.3.0.3 with kubernetes 1.16.5

Answer 3 · 2020-07-20T07:35:28.000Z

This might be related to the fact that we currently use the cheaper algorithm.

Not sure, I added cheaper worker to solve this issue I was facing before adding them :(

Answer 4 · 2020-07-20T08:56:02.000Z

I have no explanation as to why the request is retried. My understanding is that this shouldn't happen since there's only one backend server configured (confirmed with kubectl ingress-nginx backends -n kube-system)

This is a feature of nginx to retry after receiving some 500's error codes (doc: nginx#proxy_next_upstream) as long as they are not POST, PUT or something else that is not idempotent.
If you have only one pod and proxy-next-upstream-tries is set to 3 (default) it will try three times the same server (this is what I understood from this thread: kubernetes/ingress-nginx#4944), this could explain why you have a retry even if you only have one server running.
here in the default nginx-ingress code: https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L784-L786 you can see it retries only on timeout.

Answer 5 · 2021-03-12T23:46:20.000Z

Closing stale issue.