gojek/ziggurat

[Analysis] Benchmark HTTP threads on Ziggurat

mjayprateek opened this issue · 0 comments

This comes from a user's feedback about Ziggurat:

We should audit number of HTTP threads of actors. If one of the downstream service is slow then all the threads are exhausted and consul check fails which de registers it from consul.

The task here is to analyze the following:

  1. At present, what's the maximum number of concurrent HTTP threads a Ziggurat app can support
  2. What are the reasons for exhaustion? Are there ways to make sure that app self-heals in such a case? For example, making async calls to downstream if the detected lag is more than a certain threshold (2-5 seconds, for example). This will help release threads faster.
  3. Explore network-level tweaks to ensure the underlying server can operate under high load.

Sharing some links I've found online.

https://www.wiliam.com.au/wiliam-blog/thread-exhaustion
https://wiki.eclipse.org/Jetty/Howto/High_Load