chaps-io/gush

Redis connection pool investigation

cabello opened this issue · 13 comments

We are trying to use Gush in production and we are constantly gettingthe number of connections exhausted.

If we run 2 workflows that fires ~5 workers each, everything run fine.
If we run 3 workflows, then we hit the connections limit.

I wonder if there is a easy way to calculate how many connections are needed, If I try to run a few thousand workflows, do I need concurrency + constant factor, for example, 10 * 5 connections or do I need thousands connections + (concurrency * constant factor)?

Hi @cabello! Thanks for reporting the issue, I didn't encounter it myself though I was running hundreds of workflows at the same time.

This sounds like a bug so I'll try to reproduce, can you share your redis settings?

Alrighty, I think I got the solution. I modified the code to use ConnectionPool sidekiq uses so that should drastically reduce the number of Redis connections.

I released version 0.3.3, can you try and report back?

Hi @pokonski thanks for the quick fix, I am trying to use the gem and there is also a version 0.4.0 should 0.4.1 be released instead? https://rubygems.org/gems/gush/versions/0.4

You are right, not sure how I missed the numbering, I'll release 0.4.1 :)

@cabello 0.4.1 released, have a go!

@pokonski Hey! Thanks for this change.

I noticed that the new version isn't on RubyGems yet: https://rubygems.org/gems/gush

Duh, my bad. It is now 💨

It's much better now we are still running on Redis connection limits, I plan on working on an example so we can investigate together soon.

Great, I'd love to see an snippet I can reproduce and base our fixes on :)

I think I got a reasonable example, here it goes.

First stop your redis server and restart it with low client limit: redis-server --maxclients 50 then start sidekiq and gush.

Then build an example workflow like this one:

class FooWorkflow < Gush::Workflow
  def configure(client_id)
    client = Client.find_by(id: client_id)

    jobs = client.accounts.map do |account|
      egg_job = run EggJob, params: { account_id: account.id }
      run HamJob, params: { account_id: account.id }, after: egg_job

      egg_job
    end

    run BarJob, params: { client_id: client_id }, after: jobs
  end
end

Now with lots (a few thousand) of clients & accounts in the database, open a console and run:

Client.find_each do |client|
  FooWorkflow.new(client.id).start!
end

Gush will hit the connection limit very quickly. When I was running with no limit the max connections I saw was ~75. So my first impression is that it doesn't grow out of control, but it's currently hard to predict how many connections are necessary.

Hope this helps!

Thanks for the detailed analysis! I'll have a deeper look into that 👍

I rechecked this case after recent changes and the maximum number of clients stops at around 33. Internally it uses more connection pooling than before for every Redis action. If you still can, are you able to recheck that with activejob branch?

Though the problem comes from running a lot of jobs which spawn separate connection pools, independently. So that is the biggest problem I see now.

Version 1.0.0 decreases the number of operations during processing of workflows so should improve even more. Please open a new ticket if issue still exists.