Leaking database connection issue
Galathius opened this issue · 13 comments
Thank you so much for this gem. It is a really great thing to have in our code base.
After we added this gem to our project we started getting a lot of these kinds of errors from the Sidekiq:
could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use
I may assume that this is somehow relates to usage of ActiveRecord::Base.connection
instead of ActiveRecord::Base.connection_pool.with_connection { |connection| ... }
.
So far we have fixed that issue by wrapping all after_commit { do_stuff }
instructions with:
ActiveRecord::Base.connection_pool.with_connection do |conn|
after_commit(connection: conn) { ...do_stuff... }
end
Probably, it would be nice to have some guard here to check if we have any active connections, if not – we can run the block immediately (absolutely the same check as you have for an opened transaction).
Useful links:
Hmm, that's interesting. I believe that ActiveRecord::Base.connection
re-uses already checked out connection and doesn't checkout new one (unless it wasn't checked out before in the same job/request/whatever, but usually you always do some queries before calling after_commit
).
Can you tell a little more about your usage: ActiveRecord version, your database setup (multiple databases?), any custom database-related initializers?
Gem versions:
- activerecord (= 6.1.5.1)
- sidekiq (6.4.0)
- sidekiq-scheduler (3.1.0)
You're right, ActiveRecord::Base.connection
reuses existing db-connection. But there may be a case when connection is not created yet for the current thread. (e.g. you have bunch of sidekiq jobs that are not using db at all, just do some requests to 3rd-pary service and schedules new jobs).
Probably, for more safety would be cool to use ActiveRecord::Base.connection_pool.connected?
, just to not have any side-effects.
More over, guys from Sidekiq added support of after_commit_everywhere
out of the box, this change may be relevant for them.
BTW, I'm still not 100% sure that ActiveRecord::Base.connection
is a reason for leaking database connections in my project. I'm going to wait for few more days to be 100% sure (the issue is very flaky, but we faced this right after we released integration with after_commit_everywhere
).
And a bit more details about how we use this gem. (Almost the same as Sidekiq guys use it)
lib/sidekiq/postpone_scheduling_until_transaction_commit.rb
:
module Sidekiq
module PostponeSchedulingUntilTransactionCommit
include AfterCommitEverywhere
def raw_push(payloads)
if payloads.first['postpone_scheduling_until_transaction_commit'] == false
super
else
ActiveRecord::Base.connection_pool.with_connection do |conn|
after_commit(connection: conn) { super }
end
end
end
end
end
Before the fix which (probably) solved our problem it was:
def raw_push(payloads)
if payloads.first['postpone_scheduling_until_transaction_commit'] == false
super
else
after_commit { super }
end
end
config/initializers/sidekiq.rb
require 'sidekiq/postpone_scheduling_until_transaction_commit'
Sidekiq::Client.public_send(:prepend, Sidekiq::PostponeSchedulingUntilTransactionCommit)
Got it, thanks.
I think that it is better not to checkout new connection with connection_pool.with_connection
(which maybe pretty expensive in case when there are no available existing connections and new connection need to be established and configured), but just execute block in-place.
I'm probably going to add connection_pool.connected?
check proposed by you (but need to think on tests for this).
Great! Thank you so much, I'll keep you posted if the issue will be reproduced again (even with my fix).
Ok, I think the reason why this gem worked well for years and you're literally the first person to complain is that usually all its invocations (whether it happens during serving HTTP request in Rails controller or performing job in Sidekiq worker process) are always done inside Rails executor which checks in any connections back to the connection pool.
See https://guides.rubyonrails.org/threading_and_code_execution.html for details (thanks @palkan for the link)
It seems that sometimes your app enqueues Sidekiq jobs outside of executor context (not within controller action or another Sidekiq job) and in that case connection is being checked out and never checked back in.
So after_commit_everywhere should be fixed for such usages and non-Rails apps using ActiveRecord.
Hm, I'm pretty sure that all jobs are queued eventually from controllers and Sidekiq jobs.
It does not seem like our monkey patching of sidekiq Sidekiq::PostponeSchedulingUntilTransactionCommit
is executed outside of Rails executor.
But probably this issue may be somehow connected with a combination of monkey patching and usage of sidekiq-scheduler
gem. (this line).
If this is true, then Sidekiq will have the same bug in the nearest feature. Need to investigate this deeper.
Thanks so much for your help!
As soon as I give up with my workaround and updated gem version – the problem started to appear again :(
Seems ActiveRecord::Base.connection_pool.connected?
is not a proper way to detect that we're currently in the "active" connection. connected?
return true if the current process has at least one open connection (that may not be used right now).
For example, you can run this code from the rails console:
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 0
3.0.3 (main):0 > 3.times.map { Thread.new { AfterCommitEverywhere.after_commit {puts 1} } }.map(&:join)
=> ....
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 0
3.0.3 (main):0 > ActiveRecord::Base.connection
=> ...
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 1
3.0.3 (main):0 > 3.times.map { Thread.new { AfterCommitEverywhere.after_commit {puts 1} } }.map(&:join)
=> ...
3.0.3 (main):0 > ActiveRecord::Base.connection_pool.connections.size
=> 4
Ideally, we need to find a way to check if there is any open connection
without opening connection (what ActiveRecord::Base.connection
does. MAYBE we need to try ::Rails.application.executor.active?
method to identify should we close connection manually or not.
Seems I found a way to reproduce it from Rails console. Just run each example in separate rails consoles:
5.times.map do |i|
Thread.new do
sleep i
ActiveRecord::Base.connection_pool.with_connection do |conn|
AfterCommitEverywhere.after_commit(connection: conn) { 'do stuff' }
end
end
end.map(&:join)
ActiveRecord::Base.connection_pool.connections.count # => 1
Here everything works good, the single connection is reused for multiple thread.
5.times.map do |i|
Thread.new do
sleep i
AfterCommitEverywhere.after_commit { 'do_stuff' }
end
end.map(&:join)
ActiveRecord::Base.connection_pool.connections.count # => 0
The process's connections pool is not connected to the database, no new connections were created, OK.
ActiveRecord::Base.connection
5.times.map do |i|
Thread.new do
sleep i
AfterCommitEverywhere.after_commit { 'do_stuff' }
end
end.map(&:join)
ActiveRecord::Base.connection_pool.connections.count # => 6
after_commit_everywhere
spawns new connections that are not retrieved back to the pool.
Thanks for reproduction steps! I will take another look into it.