buchgr/bazel-remote

Large number of TLS handshakes to the S3 proxy

gabrielrussoc opened this issue · 1 comments

Hi all,

I noticed a lot of CPU usage / general slowness coming out of the Bazel Remote and after some digging I was able to pin it down to a very large number of TLS handshakes to our S3 bucket. The problem goes away if I set the --s3.disable_ssl flag. The metrics show a drop of FindMissingBlob requests from 4s to 400ms on a p90 level using the same physical resources.

It turns out this issue is not specific to Bazel Remote but rather to minio (the client used to talk to s3). I opened an issue there with reproduction details: minio/minio-go#1855. Unfortunately, the issue might be even lower and actually be on the Go http library itself: golang/go#50984.

I'm exploring whether disabling SSL is feasible for our environment, but it makes the Bazel Remote basically unusable for our volume (we're trying it with a peak of 100k requests / minute, but the real load is much higher).

I'm using Bazel Remote v2.4.1 on kubernetes using Docker as a runtime.

Thanks for the detailed bug report.

Reading through the linked issues, it sounds like we're stuck waiting for a fix in a future version of go. Except perhaps if there is another go s3 client that doesn't use net/http.

I'm exploring whether disabling SSL is feasible for our environment, but it makes the Bazel Remote basically unusable for our volume (we're trying it with a peak of 100k requests / minute, but the real load is much higher).

You might be able to try use this with a TLS termination proxy, to try and offload the handshakes.