Homebrew/homebrew-core

openssl@3.2.0 update makes psql crash when connecting with TLS

kozlek opened this issue ยท 29 comments

brew gist-logs <formula> link OR brew config AND brew doctor output

Error: No logs.

Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry or file an issue; just ignore this. Thanks!

Warning: Some installed formulae are deprecated or disabled.
You should find replacements for the following formulae:
  openssl@1.1

Verification

  • My "brew doctor output" says Your system is ready to brew. and am still able to reproduce my issue.
  • I ran brew update and am still able to reproduce my issue.
  • I have resolved all warnings from brew doctor and that did not fix my problem.
  • I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.

What were you trying to do (and why)?

I'm trying to use psql from postgresql@16 to connect to a server that requires TLS.

What happened (include all command output)?

psql is crashing with a pointer error.

What did you expect to happen?

psql should connect successfully to a TLS postgresql server.

Step-by-step reproduction instructions (by running brew commands)

My issue has been fixed by downgrading `openssl@3` to `openssl` version 3.1.4

What happened (include all command output)?

Can you post the full output log you get? psql is known in the past to output pointer errors that are actually a consequence of earlier errors.

psql -h xxx -p 5432 -U xxx -d xxx

psql: error: connection to server at
"xxx" (x.x.x.x), port 5432
failed: FATAL: no PostqreSQL user name specified in startup packet
connection to server at xxx" (x.x.x.x), port 5432 failed: FATAL no PostgreSQL user name specified in startup packet
psql(6636,0x10f1de600) malloc: *** error for object 0x7f916b00bc00: pointer being freed was not allocated
psql(6636,0x10f1de600) malloc: *** set a breakpoint in malloc _error break to debug

I am also experiencing this issue

I was able to connect to my local postgresql instance (without SSL), but unable to connect to any remote server (all using TLS).

First I tried:

  • reboot
  • downgrade from postgresql@16 to postgresql@14
  • reinstall postgresql

Based on the recent homebrew updates, I suspected openssl and the downgrade to 3.1.4 worked immediately.

I don't know if the problem is related to the way the psql binary is linked or if the issue touches others projects relying on openssl.

I can reproduce with an asdf-built postgresql against last night's openssl@3, which should eliminate the postgresql port:

broz@REDACTED:~/src/REDACTED$ type psql
psql is hashed (/Users/broz/.asdf/shims/psql)
broz@REDACTED:~/src/REDACTED$ otool -L /Users/broz/.asdf/installs/postgres/16.1/bin/psql
/Users/broz/.asdf/installs/postgres/16.1/bin/psql:
	/Users/broz/.asdf/installs/postgres/16.1/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.16.0)
	/opt/homebrew/opt/openssl@3/lib/libssl.3.dylib (compatibility version 3.0.0, current version 3.0.0)
	/opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.12)
	/usr/lib/libedit.3.dylib (compatibility version 2.0.0, current version 3.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)
broz@REDACTED:~/src/REDACTED$ psql ${PROD_DATABASE_URL}
psql: error: connection to server at "REDACTED" (REDACTED), port 5432 failed: FATAL:  no PostgreSQL user name specified in startup packet
connection to server at "REDACTED" (REDACTED), port 5432 failed: FATAL:  no PostgreSQL user name specified in startup packet
psql(36909,0x1dfdc9ec0) malloc: double free for ptr 0x14a809200
psql(36909,0x1dfdc9ec0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
broz@REDACTED:~/src/REDACTED$ 

Does someone have the steps to downgrade to 3.1.4?

no PostqreSQL user name specified in startup packet

Thanks this is useful. Looks like it's doing the SSL handshake but failing to send data properly afterwards for some reason. Will take a look.

Does someone have the steps to downgrade to 3.1.4?

curl -L https://raw.githubusercontent.com/Homebrew/homebrew-core/e68186ba5a05a6ea9a30d6c7744de9a46bd3aadd/Formula/o/openssl@3.rb > openssl@3.rb && brew install openssl@3.rb

That's the commit that upgraded the formula from 3.1.4 to 3.2. Feel free to confirm for yourself though.

I might add that psycopg (a python package linked to libpq) is suffering from the same issue when openssl 3.2.0 is used.

The problem might be an incompatibility between openssl 3.2.0 and libpq OR a packaging issue related to Homebrew.

To be sure we need to test another distribution of openssl+ libpq, but nor Archlinux nor Alpine had upgraded to openssl 3.2.0 yet.

I am currently debugging this issue from the Postgres side. Here is a backtrace when something seems to go wrong. openssl is overwriting memory in the PGconn struct, which we later free in freePGconn because we think we allocated the memory (which we originally did).

gdb --args psql 'postgresql://$DB?sslmode=require'
GNU gdb (Fedora Linux) 13.2-10.fc39
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from psql...
(gdb) b fe-connect.c:683
No source file named fe-connect.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (fe-connect.c:683) pending.
(gdb) r
Starting program: /home/tristan957/.opt/postgresql/bin/psql $DB\?sslmode=require
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, PQconnectdbParams (keywords=0x4bf8c0, values=0x4bf910, expand_dbname=1) at ../src/interfaces/libpq/fe-connect.c:683
683                     (void) connectDBComplete(conn);
(gdb) watch conn->pghost
Hardware watchpoint 2: conn->pghost
(gdb) c
Continuing.

Hardware watchpoint 2: conn->pghost

Old value = 0x4c8e40 "<redacted>"
New value = 0x0
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:245
Downloading source file /usr/src/debug/glibc-2.38-11.fc39.x86_64/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
245             VMOVU   %VMM(0), (VEC_SIZE * 1)(%rdi)
(gdb) bt
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:245
#1  0x00007ffff750d997 in sock_ctrl (b=0x531150, cmd=104, num=0, ptr=0x7fffffffbc2c) at crypto/bio/bss_sock.c:197
#2  0x00007ffff74fcd40 in BIO_ctrl (b=0x531150, cmd=104, larg=0, parg=0x7fffffffbc2c) at crypto/bio/bio_lib.c:677
#3  0x00007ffff74fcbef in BIO_int_ctrl (b=0x531150, cmd=104, larg=0, iarg=3) at crypto/bio/bio_lib.c:647
#4  0x00007ffff7f8ef03 in my_SSL_set_fd (conn=0x4bf960, fd=3) at ../src/interfaces/libpq/fe-secure-openssl.c:1974
#5  0x00007ffff7f8d996 in initialize_SSL (conn=0x4bf960) at ../src/interfaces/libpq/fe-secure-openssl.c:1208
#6  0x00007ffff7f8c3f5 in pgtls_open_client (conn=0x4bf960) at ../src/interfaces/libpq/fe-secure-openssl.c:132
#7  0x00007ffff7f88cc8 in pqsecure_open_client (conn=0x4bf960) at ../src/interfaces/libpq/fe-secure.c:156
#8  0x00007ffff7f73b1a in PQconnectPoll (conn=0x4bf960) at ../src/interfaces/libpq/fe-connect.c:3411
#9  0x00007ffff7f72397 in connectDBComplete (conn=0x4bf960) at ../src/interfaces/libpq/fe-connect.c:2509
#10 0x00007ffff7f6f4cc in PQconnectdbParams (keywords=0x4bf8c0, values=0x4bf910, expand_dbname=1) at ../src/interfaces/libpq/fe-connect.c:683
#11 0x0000000000439517 in main (argc=2, argv=0x7fffffffd0a8) at ../src/bin/psql/startup.c:272

Additional information:

(gdb) p data
$1 = (struct bss_sock_st *) 0x4bf960
(gdb) up 9
#10 0x00007ffff7f6f4cc in PQconnectdbParams (keywords=0x4bf8c0, values=0x4bf910, expand_dbname=1) at ../src/interfaces/libpq/fe-connect.c:683
683                     (void) connectDBComplete(conn);
(gdb) p conn
$2 = (PGconn *) 0x4bf960
(gdb)

Our PGconn is reinterpreted as a bss_sock_st.

Yes, this is a misuse of BIO_set_data from the Postgres side. This fixes it:

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 4aeaf08312..e669bdbf1d 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1815,11 +1815,6 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif
-
 /* protected by ssl_config_mutex */
 static BIO_METHOD *my_bio_methods;
 
@@ -1828,7 +1823,7 @@ my_sock_read(BIO *h, char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_read((PGconn *) BIO_get_data(h), buf, size);
+	res = pqsecure_raw_read((PGconn *) BIO_get_app_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1858,7 +1853,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_write((PGconn *) BIO_get_data(h), buf, size);
+	res = pqsecure_raw_write((PGconn *) BIO_get_app_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1968,7 +1963,7 @@ my_SSL_set_fd(PGconn *conn, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_data(bio, conn);
+	BIO_set_app_data(bio, conn);
 
 	SSL_set_bio(conn->ssl, bio, bio);
 	BIO_set_fd(bio, fd, BIO_NOCLOSE);

(+ could also remove configure checks for BIO_get_data)

Can I get your name and email to credit you as co-author (or whatever you want me to put in the Co-authored-by trailer)? I think your patch may require changes to support older versions of openssl.

Or I can review your patch when you send it to the pgsql-hackers mailing list.

Actually, openssl 1.1.1 has BIO_{get,set,}_app_data(), so all good on your patch.

Yeah it should have been around since the SSLeay days, though haven't actually tested every older version.

Here's a full commit with my name & email attached as the author: Bo98/postgres@93f5791. Name & email should also be visible on my profile: https://github.com/Bo98. Please forward and modify as necessary - I've not fully looked into the patching process upstream and will be out for a few hours.

@Bo98 thanks for your work on this! I will CC you on the email that I send to the list.

Does someone have the steps to downgrade to 3.1.4?

curl -L https://raw.githubusercontent.com/Homebrew/homebrew-core/e68186ba5a05a6ea9a30d6c7744de9a46bd3aadd/Formula/o/openssl@3.rb > openssl@3.rb && brew install openssl@3.rb

That's the commit that upgraded the formula from 3.1.4 to 3.2. Feel free to confirm for yourself though.

Thanks so much for this.

FYI to anyone who runs this: the first time I ran this, it didn't work, giving the error:

Error: openssl@3 3.2.0 is already installed
To install 3.1.4, first run:
  brew unlink openssl@3

Following these instructions and running brew unlink openssl@3 before the above command worked for me and got postgres up and running again.

I'm getting the same thing... one other note that pg_dump can dump my local database, but connecting remotely, I get:

pg_dump: error: connection to server at "....rds.amazonaws.com" (x.x.x.x), port 5432 failed: FATAL: no PostgreSQL user name specified in startup packet
connection to server at "....rds.amazonaws.com" (35.153.111.90), port 5432 failed: FATAL: no PostgreSQL user name specified in startup packet
pg_dump(39910,0x1de1d9ec0) malloc: double free for ptr 0x13280c200
pg_dump(39910,0x1de1d9ec0) malloc: *** set a breakpoint in malloc_error_break to debug
Docker/Scripts/sshDumpProd.sh: line 3: 39910 Abort trap: 6 pg_dump -Fd $PGDATABASE -f ./Docker/Dump/FromProduction.dump

Connecting locally is not going through openssl, so you don't experience the same problems. Apply the patch I posted to the mailing list if you want openssl 3.2 support.

Homebrew's Postgreses now are compatible with openssl 3.2, please run brew upgrade postgresql@<YOUR POSTGRES VERSION> or brew upgrade libpq to get the fixed version.

This has helped me fix my error. Thank you so much for this post <3

Do we know where between php, pgsql, OpenSSL and Brew this bug arises?

@thomas-shirley it was in pgsql. OpenSSL just exposed an incorrect API usage within pgsql

Bo98 commented

Homebrew's build of PostgreSQL has this bug fixed. For builds from other vendors, you'll need to wait for a new release or ask the builder to incorporate the patch.

hello ! i am still facing this issue i have done everything you did but still get the error using an engine with postgres + psycopg2
Engine(postgresql+psycopg2://***:***@host:5432/dbname)
I get the error when trying to launch the db, the malloc double free. has anyone found something else to fix it ? I am working on mac M3 with postgres: postgresql@14 14.10_1

-> with engine.connect() as conn:
(Pdb) n
Python(23416,0x20325f240) malloc: double free for ptr 0x7f8818008200
Python(23416,0x20325f240) malloc: *** set a breakpoint in malloc_error_break to debug

Thanks

hello ! i am still facing this issue i have done everything you did but still get the error using an engine with postgres + psycopg2 Engine(postgresql+psycopg2://***:***@host:5432/dbname) I get the error when trying to launch the db, the malloc double free. has anyone found something else to fix it ? I am working on mac M3 with postgres: postgresql@14 14.10_1

-> with engine.connect() as conn:
(Pdb) n
Python(23416,0x20325f240) malloc: double free for ptr 0x7f8818008200
Python(23416,0x20325f240) malloc: *** set a breakpoint in malloc_error_break to debug

Thanks

If you installed psycopg2 through psycopg2-binary package, most likely, you've got bundled older libpq (which doesn't contain the fix). In this case, you need to reinstall psycopg2 using psycopg2 package (which is linked with system libraries).

it fixed the issue, thanks a lot
๐Ÿ˜„

hello ! i am still facing this issue i have done everything you did but still get the error using an engine with postgres + psycopg2 Engine(postgresql+psycopg2://***:***@host:5432/dbname) I get the error when trying to launch the db, the malloc double free. has anyone found something else to fix it ? I am working on mac M3 with postgres: postgresql@14 14.10_1

-> with engine.connect() as conn:
(Pdb) n
Python(23416,0x20325f240) malloc: double free for ptr 0x7f8818008200
Python(23416,0x20325f240) malloc: *** set a breakpoint in malloc_error_break to debug

Thanks

If you installed psycopg2 through psycopg2-binary package, most likely, you've got bundled older libpq (which doesn't contain the fix). In this case, you need to reinstall psycopg2 using psycopg2 package (which is linked with system libraries).

FWIW, it seems possible that postgres fix for this issue will be in the yet to be stamped 14.11

postgres/postgres@c82207a