Leaking database connection issue

Question

Leaking database connection issue

Galathius opened this issue 3 years ago · 13 comments

Thank you so much for this gem. It is a really great thing to have in our code base.

After we added this gem to our project we started getting a lot of these kinds of errors from the Sidekiq:

  could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use

I may assume that this is somehow relates to usage of ActiveRecord::Base.connection instead of ActiveRecord::Base.connection_pool.with_connection { |connection| ... }.

So far we have fixed that issue by wrapping all after_commit { do_stuff } instructions with:

ActiveRecord::Base.connection_pool.with_connection do |conn|
  after_commit(connection: conn) { ...do_stuff... }
end

Probably, it would be nice to have some guard here to check if we have any active connections, if not – we can run the block immediately (absolutely the same check as you have for an opened transaction).

Useful links:

https://gitlab.com/gitlab-org/gitlab-foss/-/issues/27949

Answer 1 · 2022-06-09T12:51:25.000Z

Hmm, that's interesting. I believe that ActiveRecord::Base.connection re-uses already checked out connection and doesn't checkout new one (unless it wasn't checked out before in the same job/request/whatever, but usually you always do some queries before calling after_commit).

Can you tell a little more about your usage: ActiveRecord version, your database setup (multiple databases?), any custom database-related initializers?

Answer 2 · 2022-06-09T14:46:00.000Z

Gem versions:

activerecord (= 6.1.5.1)
sidekiq (6.4.0)
sidekiq-scheduler (3.1.0)

You're right, ActiveRecord::Base.connection reuses existing db-connection. But there may be a case when connection is not created yet for the current thread. (e.g. you have bunch of sidekiq jobs that are not using db at all, just do some requests to 3rd-pary service and schedules new jobs).
Probably, for more safety would be cool to use ActiveRecord::Base.connection_pool.connected?, just to not have any side-effects.

More over, guys from Sidekiq added support of after_commit_everywhere out of the box, this change may be relevant for them.

BTW, I'm still not 100% sure that ActiveRecord::Base.connection is a reason for leaking database connections in my project. I'm going to wait for few more days to be 100% sure (the issue is very flaky, but we faced this right after we released integration with after_commit_everywhere).

Answer 3 · 2022-06-09T14:48:23.000Z

And a bit more details about how we use this gem. (Almost the same as Sidekiq guys use it)

lib/sidekiq/postpone_scheduling_until_transaction_commit.rb:

module Sidekiq
  module PostponeSchedulingUntilTransactionCommit
    include AfterCommitEverywhere

    def raw_push(payloads)
      if payloads.first['postpone_scheduling_until_transaction_commit'] == false
        super
      else
        ActiveRecord::Base.connection_pool.with_connection do |conn| 
          after_commit(connection: conn) { super }
        end
      end
    end
  end
end

Before the fix which (probably) solved our problem it was:

    def raw_push(payloads)
      if payloads.first['postpone_scheduling_until_transaction_commit'] == false
        super
      else
        after_commit { super }
      end
    end

config/initializers/sidekiq.rb

require 'sidekiq/postpone_scheduling_until_transaction_commit'
Sidekiq::Client.public_send(:prepend, Sidekiq::PostponeSchedulingUntilTransactionCommit)

Answer 4 · 2022-06-09T15:02:42.000Z

Got it, thanks.

I think that it is better not to checkout new connection with connection_pool.with_connection (which maybe pretty expensive in case when there are no available existing connections and new connection need to be established and configured), but just execute block in-place.

I'm probably going to add connection_pool.connected? check proposed by you (but need to think on tests for this).

Answer 5 · 2022-06-09T15:19:18.000Z

Great! Thank you so much, I'll keep you posted if the issue will be reproduced again (even with my fix).

Answer 6 · 2022-06-09T16:55:31.000Z

Ok, I think the reason why this gem worked well for years and you're literally the first person to complain is that usually all its invocations (whether it happens during serving HTTP request in Rails controller or performing job in Sidekiq worker process) are always done inside Rails executor which checks in any connections back to the connection pool.
See https://guides.rubyonrails.org/threading_and_code_execution.html for details (thanks @palkan for the link)

It seems that sometimes your app enqueues Sidekiq jobs outside of executor context (not within controller action or another Sidekiq job) and in that case connection is being checked out and never checked back in.

So after_commit_everywhere should be fixed for such usages and non-Rails apps using ActiveRecord.

Answer 7 · 2022-06-09T20:56:06.000Z

Hm, I'm pretty sure that all jobs are queued eventually from controllers and Sidekiq jobs.
It does not seem like our monkey patching of sidekiq Sidekiq::PostponeSchedulingUntilTransactionCommit is executed outside of Rails executor.

But probably this issue may be somehow connected with a combination of monkey patching and usage of sidekiq-scheduler gem. (this line).

If this is true, then Sidekiq will have the same bug in the nearest feature. Need to investigate this deeper.

Thanks so much for your help!

Answer 8 · 2022-06-10T09:23:06.000Z

Here is the pull request, feel free to comment: #21

Answer 9 · 2022-06-10T16:13:53.000Z

Fix has been released in version 1.2.1.

Thank you very much for reporting!

Answer 10 · 2022-06-13T12:15:48.000Z

As soon as I give up with my workaround and updated gem version – the problem started to appear again :(
Seems ActiveRecord::Base.connection_pool.connected? is not a proper way to detect that we're currently in the "active" connection. connected? return true if the current process has at least one open connection (that may not be used right now).

For example, you can run this code from the rails console:

3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 0
3.0.3 (main):0 > 3.times.map { Thread.new { AfterCommitEverywhere.after_commit {puts 1} } }.map(&:join)
=> ....
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 0
3.0.3 (main):0 > ActiveRecord::Base.connection
=> ...
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 1
3.0.3 (main):0 > 3.times.map { Thread.new { AfterCommitEverywhere.after_commit {puts 1} } }.map(&:join)
=> ...
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 4

Ideally, we need to find a way to check if there is any open connection without opening connection (what ActiveRecord::Base.connection does. MAYBE we need to try ::Rails.application.executor.active? method to identify should we close connection manually or not.

Answer 11 · 2022-06-13T12:44:28.000Z

Seems I found a way to reproduce it from Rails console. Just run each example in separate rails consoles:

5.times.map do |i|
  Thread.new do
    sleep i
    ActiveRecord::Base.connection_pool.with_connection do |conn|
      AfterCommitEverywhere.after_commit(connection: conn) { 'do stuff' }
    end
  end
end.map(&:join)

ActiveRecord::Base.connection_pool.connections.count # => 1

Here everything works good, the single connection is reused for multiple thread.

5.times.map do |i|
  Thread.new do
    sleep i
    AfterCommitEverywhere.after_commit { 'do_stuff' }
  end
end.map(&:join)

ActiveRecord::Base.connection_pool.connections.count # => 0

The process's connections pool is not connected to the database, no new connections were created, OK.

ActiveRecord::Base.connection
5.times.map do |i|
  Thread.new do
    sleep i
    AfterCommitEverywhere.after_commit { 'do_stuff' }
  end
end.map(&:join)

ActiveRecord::Base.connection_pool.connections.count # => 6

after_commit_everywhere spawns new connections that are not retrieved back to the pool.

Answer 12 · 2022-06-15T18:57:17.000Z

Thanks for reproduction steps! I will take another look into it.

Answer 13 · 2022-06-20T14:23:37.000Z

One more fix has been released in 1.2.2 (using active_connection? instead of connected?)