sfackler/r2d2

Error: free(): invalid pointer

VictorKoenders opened this issue · 4 comments

I'm using r2d2 in a (closed source) project for a customer, and it seems to try to incorrectly free a pointer.

stdout is: *** Error in `/home/vincent/coolerbot/release/bot': free(): invalid pointer: 0x00007f1e2c076c60 ***

Debian 3.16.59-1
rustc 1.32.0
r2d2 .8.3
pg-sys 0.4.6
diesel 1.4.1

postgres:

postgresql-10/jessie-pgdg,now 10.6-1.pgdg80+1 amd64 [installed]                                       
postgresql-client-10/jessie-pgdg,now 10.6-1.pgdg80+1 amd64 [installed,automatic]                      
postgresql-client-common/jessie-pgdg,now 199.pgdg80+1 all [installed,automatic]                       
postgresql-common/jessie-pgdg,now 199.pgdg80+1 all [installed,automatic]    

r2d2 pool is created with:

use diesel::pg::PgConnection;
use diesel::r2d2::ConnectionManager;
use r2d2::Pool;

#[derive(Clone)]
pub struct Connection {
    pub(crate) conn: Pool<ConnectionManager<PgConnection>>,
}

impl Connection {
    pub fn connect(url: &str) -> Result<Connection> {
        let conn = Pool::new(ConnectionManager::new(url))?;
        Ok(Connection { conn })
    }
}

fn main() {
    let conn = Connection::connect(&config.database_url).expect("Could not connect to database");
}

gdb backtrace:

#0  0x00007f1e34cdb067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f1e34cdc448 in __GI_abort () at abort.c:89
#2  0x00007f1e34d191b4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f1e34e0e210 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f1e34d1e98e in malloc_printerr (action=1, str=0x7f1e34e0a326 "free(): invalid pointer", ptr=<optimized out>) at malloc.c:4996
#4  0x00007f1e34d1f696 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x00007f1e3589c4a0 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#6  0x000055b45221eeb0 in r2d2::CustomizeConnection::on_release::hf8217cbd7c8ed8c5 ()
#7  0x000055b45221e2de in r2d2::drop_conns::h3df3ef3e77ab5c7f ()
#8  0x000055b45221ecf5 in r2d2::reap_connections::h84ed5f0047adfb7f ()
#9  0x000055b452281c31 in scheduled_thread_pool::Worker::run_job::h3b008cb21134a03b ()
#10 0x000055b452282311 in std::panicking::try::do_call::hef918f7ba5b2651f ()
#11 0x000055b4522a609a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:102
#12 0x000055b4522819a9 in scheduled_thread_pool::Worker::run::h1861b2dabd2cd6ed ()
#13 0x000055b4522839f4 in std::sys_common::backtrace::__rust_begin_short_backtrace::h0ebd79a6a13e4f1f ()
#14 0x000055b4522a609a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:102
#15 0x000055b45228362c in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::h8e8ed42246ded358 ()
#16 0x000055b452299cfe in call_once<(),()> () at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b/src/liballoc/boxed.rs:683
#17 start_thread () at src/libstd/sys_common/thread.rs:24
#18 std::sys::unix::thread::Thread::new::thread_start::hca8e72c41fa9d291 () at src/libstd/sys/unix/thread.rs:90
#19 0x00007f1e3526f064 in start_thread (arg=0x7f1e31f23700) at pthread_create.c:309
#20 0x00007f1e34d8e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I'm not sure what other information to collect, so if something is missing feel free to ask

I don't think this is an issue in r2d2 itself, since it's entirely safe code. It might not even be a problem in the connection library wrapping libpq. Allocator corruption errors like these can be hard to track down since the action that corrupted the heap could have taken place far away from when you discovered the corruption.

I'd probably try to use tools to identify the corruption when it happens. One option is running the binary in Valgrind. Another could be to build the Rust code (and ideally the C code it links against) with address sanitizer.

I don't think it's an issue in r2d2 either, but I didn't know where else to post this issue.

For this specific case it worked to just put a Mutex around a PgConnection and that fixed the issue (just in time for the demo in 30 min!).

I'll try to reproduce it this weekend and see if I can find what exactly is going on.

Update: switching from debian 8 to ubuntu 14 fixes the issue. I guess it was something in the libpq-dev on debian