diesel-rs/diesel

double free or corruption issue when establishing connection

LolaLollipop opened this issue · 18 comments

Setup

Versions

  • Rust: 1.75.0
  • Diesel: master
  • Database: Postgres
  • Operating System: Ubuntu

Feature Flags

  • diesel: postgres, r2d2

Problem description / What are you trying to accomplish?

i'm simply trying to establish a connection to postgres on ubuntu using diesel. everything works fine when i'm on windows, but the moment i switch to ubuntu (either through wsl or my vps) it errors on trying to establish this connection. since it's a c issue i guess it could be a postgres issue but i'm not entirely sure - either way this ONLY occurs when i'm doing it through ubuntu

i should mention that although i have the r2d2 feature flag, it doesn't work whether or i'm using r2d2 or not (so it's not bc of r2d2)

What is the expected output?

Getting connection...
Connection gotten
(...)

What is the actual output?

Getting connection...
double free or corruption (out)
core dumped

Are you seeing any additional errors?

no

Steps to reproduce

this is my code, obv snipped but this is the gist

static DB_URL: Lazy<String> = Lazy::new(|| 
    std::env::var("DATABASE_URL").expect("Couldn't get database url")
);
// later
println!("Getting connection...");
let conn = &mut PgConnection::establish(&DB_URL).unwrap(); // core dumped here
println!("Connection gotten");

Checklist

  • I have already looked over the issue tracker and the discussion forum
  • This issue can be reproduced on Rust's stable channel
  • This issue can be reproduced without requiring a third party crate

Thanks for opening this bug report. This kind of issue is highly depending on your environment, therefore I need to ask for a few more details:

  • Which Ubuntu version is affected exactly
  • Which libpq and which OpenSSL versions are linked
  • Are libpq and OpenSSL linked statically or dynamically
  • If they are linked dynamically are you using the same libraries for compilation and running the application

Additionally it would be very helpful to have a stack trace of the crash recorded by gdb or a similar debugger.

It would be even more helpful to have a self contained example based on a docker image that reproduces the issue.

  1. ubuntu 22.04 on both wsl and my vps
  2. i have openssl 3.0.2 and im not entirely sure what version of libpq i have
  3. i think its statically linked
Getting connection...
[New Thread 0x7ffff6709640 (LWP 1505981)]
double free or corruption (out)

Thread 5 "r2d2-worker-1" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff690d640 (LWP 1505980)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737330075200) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.

it says r2d2 worker but i'm not pooling the connections through r2d2, it might just be the feature flag doing that or something im not sure

Thanks for adding these information. Can you use the bt command in gdb after the program crashed to record a stack trace and share the output afterwards? That should tell us where exactly the crash occurs.

woops sorry, yeah here

__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737330075200) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737330075200)
   at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737330075200)
   at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737330075200, signo=signo@entry=6)
   at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c75476 in __GI_raise (sig=sig@entry=6)
   at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c5b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7cbc676 in __libc_message (action=action@entry=do_abort,
   fmt=fmt@entry=0x7ffff7e0eb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7cd3cfc in malloc_printerr (
   str=str@entry=0x7ffff7e11790 "double free or corruption (out)")
   at ./malloc/malloc.c:5664
#7  0x00007ffff7cd5e70 in _int_free (av=0x7ffff7e4dc80 <main_arena>,
   p=0x7fffec000480, have_lock=<optimized out>) at ./malloc/malloc.c:4588
#8  0x00007ffff7cd8453 in __GI___libc_free (mem=<optimized out>)
   at ./malloc/malloc.c:3391
#9  0x00007ffff7f72151 in ?? () from /lib/x86_64-linux-gnu/libpq.so.5
#10 0x000055555625a15e in diesel::pg::connection::raw::RawConnection::establish
   ()
#11 0x0000555556258981 in <diesel::pg::connection::PgConnection as diesel::connection::Connection>::establish ()

i guess its definitely caused by libpq then, not entirely sure how to fix it tho

Thanks for providing this backtrace. That's helpful. I'm able to reproduce this issue locally by using #3910 in a ubuntu 22.04 docker container. I'm currently looking into what is causing this.

I found Homebrew/homebrew-core#155651 and related https://www.postgresql.org/message-id/flat/4036729.1701130863%40sss.pgh.pa.us#7a1940670114e88862049b03fc61df4f. That is also what is causing #3910 to fail. Pulling in the relevant postgres commit seems to fix that issue for me. Therefore can you check whether you link openssl 3.2.0 statically (e.g. via the openssl-src crate?)

Just pitching in here. I have the exact same issue on a Debian docker-image.

Diesel config: diesel = { version = "2.1.4", features = ["r2d2", "postgres", "chrono"], optional = true }
Openssl version: OpenSSL 3.0.11 19 Sep 2023 (Library: OpenSSL 3.0.11 19 Sep 2023)
Libpq version: 16.2-1 (according to debian package )

I'm not statically linking anything myself. I'm just using the libraries installed on the machine.

I don't really understand how you fixed it @weiznich - Could you provide some more details on how you did it?

@madser123 Can you provide an self contained reproducing example in a docker file? That would be really helpful.

The fix above essentially just pulls in version 16.2 of libpq as that fixes a potential segfault around their interaction with openssl.

I will try, and return to you. Thank you for helping :)

@weiznich Just FYI - In my attempts to create a self contained reproduction of this, it somehow works as expected.

I will experiment further and reply back here, if i find something.

Thank you for the interest :)

@madser123 I just remember that at some point we got similar reports and they were solved back then by removing other dependencies. So it might be that it's not just diesel which is required for that minimal example, but other crates as well. So you likely want to start with all your dependencies and disable them one by one and see which are relevant.

@weiznich

I have some weird finding (At least to me). It seems to be a trait from one of my own libraries.... And i don't really get why. I will try to explain here.

Here is the binary project itself:
Cargo.toml

[package]
name = "pq-test"
version = "0.1.0"
edition = "2021"

[dependencies]
diesel          = { version = "2.1.4", features = ["r2d2", "postgres", "chrono"] }
rocket_sync_db_pools = { version = "0.1.0", features = ["diesel_postgres_pool"] }

# Local libs
bolt-rs     = { path = "../../lib/bolt-rs" }

# Web server
rocket = { version = "0.5.0", features = ["json"] }
rocket_dyn_templates = { version = "0.1.0", features = ["tera"] }

main.rs:

#[macro_use]
extern crate rocket;

use diesel::{Connection, PgConnection};
use rocket_sync_db_pools::database;

//use bolt_rs::payload::Interaction; //<-- Uncommenting this results in "double free or corruption (out)"

/// A connection pool for the Hipster database
#[database("hipster")]
pub struct Pool(PgConnection);

pub fn establish_db_connection(database_url: &str) -> PgConnection {
    PgConnection::establish(database_url).expect("Error connecting to database")
}

/// AWS Healthcheck endpoint
#[get("/healthz")]
const fn healthz() -> &'static str {
    "OK"
}

#[rocket::main]
async fn main() {
    {
        let _ =
            establish_db_connection("postgres://hipster:devpassword@hipster-db.hiper.dk/hipster");
        println!("Connected to database");
    }

    let _ = rocket::build()
        .mount("/", routes![healthz])
        .attach(Pool::fairing())
        .launch()
        .await;
}

As seen in the above code, when importing the Interaction trait from my "bolt-rs" crate, it results in the double free or corruption (out) error.

However, this is all the trait does (And it's not even relevant for any of the types in the project):

pub trait Interaction: Serialize + DeserializeOwned {
    fn identifier(&self) -> String;
    fn identifier_name() -> String;
    fn get_user(&self) -> &ResponseUser;
}

And then the trait is implemented for a bunch of internal types to the bolt-rs crate.

The cargo.toml for the bolt-rs project looks like so:

[package]
name = "bolt-rs"
version = "0.3.0"
edition = "2021"

[dependencies]
# General dependencies (Blocks, Slack-types, etc.)
reqwest     = { version = "0.11.0", default_features = false, features = ["rustls-tls", "json", "multipart"] }
serde       = { version = "1.0.145" }
serde_json  = { version = "1.0.85"  }
serde_with  = { version = "3.4.0"   }
url         = { version = "2.3.1"   }

headhunter  = { path = "../headhunter" }

Now. I understand that the trait will compile some other code, but... Appending the imports used in the Cargo.toml for the bolt-rs project (And adding a use statement for the serde::{de::DeserializeOwned, Serialize} traits, which is used by the Interaction trait) doesn't make a difference - It still works. Only when the Interaction trait is pulled in, does it fail.

I fail to see what the difference in this is, and how it can cause the error.. Diesel and rocket-sync-db-pools are the only "new" crates to all of this - Bolt-rs and everything else has been running fine until now (Not to say that these can't be the error).

Do you have any idea on how i can proceed from here or maybe some clarity as to how/why this can happen?

Okay - Figured out originates in the library where i'm using diesel. I'm able to replicate it there. I will try pulling dependencies again...

@weiznich it was openssl that was imported in the library using the Vendored feature - It was some legacy stuff from when we tried to get the binary working on MUSL images - Sorry for the inconvenience, but thank you for the help!

Thanks for coming back and leaving that comment here 👍 It's always helpful to know that the issue is resolved.

I'm closing this issue now as a upstream libpq issue with recent openssl versions (>= 3.2). It's fixed in libpq 16.2.

For any future person that encounters this issue again, please double check:

  • Whether you depend on openssl 3.2 or newer
  • Whether you depend on libpq 16.1 or older

Please carefully check where these dependencies are loaded from, as your rust dependency tree might link one or the other dependency statically or your runtime system might provide different versions than used at compile time. If you answer one of these questions above with "Yes" you need adjust one of these dependencies to a compatible version.