Consider replacing Meinheld as the gunicorn worker

Question

Consider replacing Meinheld as the gunicorn worker

Closed this issue 4 years ago · 5 comments

Should we replace the meinheld workers with something more mature and stable like gevent?

Answer 1 · 2020-06-02T10:37:21.000Z

We (well, @limed, with me back-seat-driving) moved developer-portal to gevent because meinheld was hanging on large-but-not-really-that-large POST payloads (like 300kb). It was a good move in terms of fixing that, but note that at the moment, Jenkins won't build the most two recent releases of gevent. See mdn/developer-portal#1376 as an example of the minor bump failing to build (and potentially why)

Answer 2 · 2020-06-02T14:29:23.000Z

I too am deeply skeptical towards meinheld. It might be fast on some extremely unrealistic "hello world" benchmark but beyond that, I doubt it brings any performance or memory optimizations for Kuma.

I am doubtful we need gevent even. It might be useful to accept more incoming requests but it's not necessarily that simple. There is a risk of thread-safefy and it can use too much memory.

When I built symbols.mozilla.org I researched this a lot. It's a traditional stack of AWS hosted docker with the backend Django. It took on a LOT of traffic. Sometimes between 10-100 requests per second. Plus it had to take on extremely long-running requests that would take tens of seconds.
I did encounter some strange and scary crashes due to threading when I used gevent. After a lot of benchmarking and experimentation I opted for a simple gunicorn and the command inside Docker that starts was this:

${CMD_PREFIX} gunicorn tecken.wsgi:application -b 0.0.0.0:${PORT} --timeout ${GUNICORN_TIMEOUT} --workers ${GUNICORN_WORKERS} --access-logfile -

Where GUNICORN_WORKERS was set, in puppet, like this:

$gunicorn_workers = $processorcount * 2 + 1

No fancy --worker-class=gthread or --worker-class="egg:meinheld#gunicorn_worker" stuff.
And no --threads set either.

I don't think --preload was important either but I can't remember why it wasn't used in the final Docker CMD command. I suspect that --preload just prepares the worker for the very first request which is a fraction of their complete lifetime.

Thankfully, we'll soon know pretty well what our Django is going to be asked to do. 99.9% of the requests will be /api/v1/whoami and everything else will NOT be on the critical path (e.g. subscriptions, auth, BCD signals) so they won't matter.
So the ideal thing would be if we could stand up two different pods/nodes/whatever; one with gevent and one without and then flood it with something like

hey -n 10000 -c 100 -h2 https://domain.to.django.cloud.moz.it/api/v1/whoami

(hey)
And we should probably run it repeatedly to see if we can make it smoke. And we should probably look at memory utilization on the pod/node so that it doesn't run too hot.

Answer 3 · 2020-06-02T17:29:08.000Z

Another thing; we know that 99% of future traffic to Django will be the /api/v1/whoami and in (sadly) 90+% of the time, it'll be without a session cookie. So there's no hope in getting waffle flags specifically for the user. If we can safely assume that all anonymous users get the same waffle flag results (i.e. no Waffle Flag is allowed that depends on percentage and anonymous). Then, we can cache these in memory. I.e. something like this:

_module_level_waffle_cache = {}

def whoami(request):
    if not request.is_authenticated:
       # No user, we can re-use existing cached value from module level cache
       ttl = 60  # seconds
       cache_key = int(time.time() / ttl)
       if _module_level_waffle_cache.get(cache_key):
           # cache hit
           waffle_values = _module_level_waffle_cache[cache_key]
       else:
           # cache miss
           _module_level_waffle_cache.clear()
           # the MySQL ORM based slow lookup
           _module_level_waffle_cache[cache_key] = get_waffle_values_anonymous()  
    ...

That'll be faster than anything under the sun in terms of requests per second because it's 0 I/O.
I would not dare to do anything like that with gevent or meinheld since using globals is a-ok if all you have is Python processes. So I guess, this hinges on using all Gunicorn workers and no threads.

Answer 4 · 2020-06-03T00:24:27.000Z

Thanks for your comments @stevejalim and @peterbe! For the record, I never wanted to use meinheld years ago, but just the simple sync worker, not because I knew of anything rotten with meinheld but just out of suspicion of something less proven. If I had to state my preference, it would still be for gunicorn's simple sync worker.

Answer 5 · 2020-06-11T21:29:07.000Z

Fixed in #430