djc/bb8

bb8 deadlocks in rust-pool-benchmark

bikeshedder opened this issue · 6 comments

I just stumbled across the rust-pool-benchmark repository by @Astro36 and found bb8 deadlocking. Both 0.7 and 0.8 are affected:

As the maintainer of deadpool I'm not using bb8 myself so I haven't investigated any further. I just wanted to let you know about it.

djc commented

Thanks for the report! Don't have time to look into it this week, but will try to investigate soon.

Hello.

We're using bb8-postgres. Our app have been sometimes faced stall. And I'm thinking that these stall possibly a little related by this issue.

Guess

In my guess, this fn causes RunError::TimedOut when not in luck, in benchmarking with rust-pool-benchmark.

bb8/bb8/src/inner.rs

Lines 102 to 146 in 1246142

pub(crate) async fn make_pooled<'a, 'b, F>(
&'a self,
make_pooled_conn: F,
) -> Result<PooledConnection<'b, M>, RunError<M::Error>>
where
F: Fn(&'a Self, Conn<M::Connection>) -> PooledConnection<'b, M>,
{
loop {
let mut conn = {
let mut locked = self.inner.internals.lock();
match locked.pop(&self.inner.statics) {
Some((conn, approvals)) => {
self.spawn_replenishing_approvals(approvals);
make_pooled_conn(self, conn)
}
None => break,
}
};
if !self.inner.statics.test_on_check_out {
return Ok(conn);
}
match self.inner.manager.is_valid(&mut conn).await {
Ok(()) => return Ok(conn),
Err(e) => {
self.inner.statics.error_sink.sink(e);
conn.drop_invalid();
continue;
}
}
}
let (tx, rx) = oneshot::channel();
{
let mut locked = self.inner.internals.lock();
let approvals = locked.push_waiter(tx, &self.inner.statics);
self.spawn_replenishing_approvals(approvals);
};
match timeout(self.inner.statics.connection_timeout, rx).await {
Ok(Ok(mut guard)) => Ok(make_pooled_conn(self, guard.extract())),
_ => Err(RunError::TimedOut),
}
}

This fn consists of three parts.

  • Part1. Tries to get an idle connection. End this fn with a connection if exists.(L109-L133)
  • Part2. Stands at the end of the waiting queue (L135-L140)
  • Part3. Waits to be notified of a connection that became available (L142-L145)

"Not in luck" that I wrote above is when PooledConnection::drop() runs between Part1 and Part2.

Prerequisite

  • Connections are in used out to the max_size of the Pool.
  • Waiting queue is empty.

Flow

  1. Client of bb8 calls Pool::get().
  2. Pool::get() calls PoolInner::make_pooled().

(In this situation, expects to return a old connection, that will drop during runs PoolInner::make_pooled())

  1. Executes Part1 and idle connection is not exist, so not return from this fn yet.
    4. Starts to PooledConnection::drop() of old connection. It attempt to put back it to the pool.
    5. Waiting queue is empty, so notification don't fire, and connection saved into list of idle.
    6. Complete PooledConnection::drop().

7. Execute Part2. Enqueue to waiting queue.
8. Execute Part3. Wait notification. But it's no notification will come because PooledConnection::drop() have already done.

How to modify

Between Part2 and Part3, it might be a good idea to check for the existence of idle connections again.

I'm not familiar with multi task programming. The guess may be wrong. But rust-pool-benchmark fails also in my PC.

djc commented

Thanks for the analysis! Can you try to write a test case for this? If you submit that as a PR I can probably fix this pretty soon.

Hi, @djc
I am researching whether to use bb8 and/or deadpool for my production application, which handles many connection requests to different types of databases. I would like to know what the fixes plans are for this issue (deadlocks) and understand if there are limitations in this case for bb8 to be used in production.

djc commented

There's a potential fix in #154, reviewing that is on my list.