Connection pool hangs if authentication fails
levi-nz opened this issue · 6 comments
When using a connection pool with tokio_postgres, the connection hangs until a TimedOut error is returned if the password is incorrect. This behaviour is undesirable and should just result in authentication failure.
To re-produce, simply try connect to any database using an incorrect password and the connection will hang until it times out.
I'd be happy to review a PR for this, but I don't think I'll be able to work on this myself in the near future.
I've spent a bit of time looking into this while debugging some TLS connection issues. It is decidedly not a simple fix, unfortunately. There are a few things that I've been doing to make life a little bit easier, however:
- If your deployment allows it, set
Builder::min_idle
to require at least one open connection before considering the Pool ready. Consider testing the connection parameters on startup with a short-lived pool and very low timeout. - Be sure to register an
ErrorSink
with the methodBuilder::error_sink
which will give you an opportunity to log errors that otherwise would fall off the table. - Also, I'll be taking a look at using the crate
bb8_failsafe
here for more proactive connection error management.
The touchpoint for the issues is at PoolInner::add_connection
here:
Lines 199 to 225 in 5737736
ManageConnection
impl to make the connection, and (in the case of an error) keeps trying until it gets one successfully or the pool's connection timeout expires.
The right behavior in the general case is to delay returning any error until the timeout, but it would be great if there were a way to identify particular classes of errors that shouldn't wait for the timeout. It seems like that would require cooperation from the ManageConnection
impl, which means it would require a new trait method (say something like fn error_is_fatal(&self, e: Self::Error) -> bool { return false; }
which could then be used to exit early. Then we could override the default impl of that trait method where appropriate. Unfortunately for my use-case (bb8-postgres
) the tokio-postgres
crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.
As a side note @djc or anyone else looking more deeply, this call stack is Pool::get
=> PoolInner::get
=> PoolInner::make_pooled
=> PoolInner::spawn_replenishing_approvals
raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent of spawn_replenishing_approvals
.
Lines 138 to 139 in 5737736
As a side note @djc or anyone else looking more deeply, this call stack is
Pool::get
=>PoolInner::get
=>PoolInner::make_pooled
=>PoolInner::spawn_replenishing_approvals
raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent ofspawn_replenishing_approvals
.
That sounds sensible. Want to propose a PR?
Unfortunately for my use-case (
bb8-postgres
) thetokio-postgres
crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.
Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal()
like hook to the ManageConnection
trait.
That sounds sensible. Want to propose a PR?
Happy to work something up. Since the waiter is stored in internals::PoolInternals
and the error pops up in inner::PoolInner::spawn_start_connections
, I think the move is to replace this line that throws to the error_sink
Line 67 in 5737736
with a call to a new method on SharedPool
that looks roughly like:
impl<M: ManageConnection> PoolInternals<M> {
fn forward_error(&self, err: M::Error) {
let mut locked = self.internals.lock();
while let Some(waiter) = locked.waiters.pop_front() {
match waiter.send(Err(err)) {
Ok(_) => return,
Err(e) => err = e,
}
}
self.statics.error_sink.sink(e);
}
}
Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal() like hook to the ManageConnection trait.
It looks like it has been discussed a few times (sfackler/rust-postgres#571, sfackler/rust-postgres#583, sfackler/rust-postgres#790), though usually in a different context. There's even another project that is trying to do exactly the same thing: https://github.com/palfrey/wait-for-db/blob/c3fb9dbad94f06f91f83d6ef9e4af0b93ef64083/src/pg.rs#L34-L49
In at least one case a PR was merged that exposed a limited interface to the error kind (sfackler/rust-postgres#739) so it's reasonable to think a similar PR would be well received.
Yeah, the forward_error()
approach seems reasonable. Should probably verify all sink()
callers (at least in PoolInner
) to see if they should use this.
Hmm, it's not so easy, I guess. The above strategy works, but only if the user has configured retry_connection(false)
. If not, in practice we hit the "outer" timeout on the waiter channel here:
Line 142 in 03f95c9
Line 216 in 03f95c9
After a quick review I don't immediately see a great way to get these to line up that doesn't feel like a hack (i.e. make the outer one "a little bit longer" or some such). The outer timeout can't really be removed, since we're relying on that in the face of pool contention. The inner timeout in the context of the retry loop looks locally correct, and it's hard to picture what would need to change really.
I'll go ahead and submit a PR to bubble up the error for the case of a pool configured with retry_connection(false)
, since that at least makes a little bit of progress here.