devinus/poolboy

Improving the performance of Poolboy up 2 to 3 times

mnussbaumer opened this issue · 2 comments

Hi, I've written a pooler to test some things and saw that it outperformed poolboy on most use cases, including what I think is the most normal one, a given timeout for the checkout and no waiting queue at all.

I've set up some tests spawning 150k processes, waiting 2secs, repeating this 8 times (I've tested different configurations too anyway varying the repeats, waiting time and total processes).
Each process tries to check out a connection, with a timeout of 3s and no waiting queue.
If it is able to checkout then it sends a message and sleeps for 250ms to "simulate" a real waiting time (I can't use random waits because then the tests are meaningless).

What I arrived at is for instance, using 50workers for both poolers (this was tested warm, in different orders, cold, and several times):

boy: OK: 796 Failed: 1199204 Time: 10.091
queuer: OK: 837 Failed: 1199163 Time: 3.8459999999999996

And they're always in those ranges, slightly more processed, about 1/3 of the time spent.
Regarding CPU & Mem utilisation, the gains also seem to be significant (I imagine because the VM doesn't have to set so many call timers, and undo them on timeout, much less exceptions being raised, much less msgs flowing, etc).

So then I implemented the same preemptive check in poolboy to see if it was overloaded before making the attempt to checkout, and with that check poolboy even became slightly faster than my pooler on avg, which means this is something that can be implemented with no changes to client side code for 2 to 3 times faster checkout when overloaded (it doesn't work in unbound queues because they can't get overloaded... in those cases it will keep the same profile as it has now).

There's only one slight change that needs to be decided, which is how to handle checkouts coming from remote nodes, either bypass the pre-check or instead do a rpc, it's also easily solvable.

Because poolboy is used in several places within Elixir packages I would like to write a PR for this if there's interest? The changes are not complex at all.

@mnussbaumer would you mind making your code available?

@hkrutzer I don't have a PR'able version ready, but basically it's:
creating an ETS when the supervisor goes up -> Ets = ets:new(?ETS, [public, named_table, {read_concurrency, true}]),

on checkout when there's no available workers & not blocking:
true = ets:insert(Ets, {self(), 1}),

Then on checkin, reset that entry by:
true = ets:insert(Ets, {self(), 0}),

And adding a special checkout clause for the when it's a not blocking request

checkout(Pool, false, Timeout) ->
    CRef = make_ref(),
    case ets:lookup(?ETS, Pool) of
        [{_, 0}] ->
            try
                gen_server:call(Pool, {checkout, CRef, Block}, Timeout)
            catch
                ?EXCEPTION(Class, Reason, Stacktrace) ->
                    gen_server:cast(Pool, {cancel_waiting, CRef}),
                    erlang:raise(Class, Reason, ?GET_STACK(Stacktrace))
            end;
        _ -> full
  end.

But this needs some more work, deciding on the transaction mode, and if poolboy is being used across nodes as there won't be that ets in nodes that didn't start that pool, etc. IT was just a proof of concept I wiped out to test where I knew those wouldn't be issues.

You might be interested in https://github.com/mnussbaumer/workforce
It is pretty much faster than poolboy, it has tests and benchmarking (there's a write-up in the benchmarking folder) and that benchmarking was made previous to the last tweaks that have made it even faster in general, but no real production workloads.
Even with this theoretical fix to do a pre-check on poolboy it was still slower across the board, but much faster than poolboy without it.
(note that on hex.pm the version that is available for workforce is the first one, as I had problems cutting a new version with rebar3 and haven't gotten to it yet - so use the dependency from github and do some testing against using poolboy?)