heroku/base-images

Breaking changes in PostgreSQL 12's libpq

Closed this issue · 10 comments

The stack images currently install the PostrgreSQL client, library and headers from the upstream PostgreSQL APT repository (apt.postgresql.org) rather than the less frequently updated packages in Ubuntu's APT repository.

This means that as new major Postgres versions are released, the next stack image release will include updated major versions of libpq5 and libpq-dev (the clients themselves use versioned package names such as postgresql-client-11 so have to be explicitly updated).

New major versions of libpq are generally backwards compatible, however libpq v12 includes the following breaking change that makes the validation of connection parameters more strict:
postgres/postgres@e7a2217

Unfortunately this breaking change wasn't listed in the PostgreSQL 12 release notes, so we weren't able to factor it into the stack release review process.

This meant the latest stack image update caused database connection errors at runtime for a couple of customers, since their Rails apps were using invalid values for connection parameters (such as connect_timeout having trailing non-numeric characters). (Previously with libpq v11 these invalid values were being silently ignored.)

As such we:

  • rolled back that stack image release
  • reported the missing release notes breaking change entry to the Postgres project
  • filed ged/ruby-pg/issues/302 to have the pg gem report the more helpful validation related error message instead of PG::UnableToSend: no connection to the server

Determining how to best roll out the PG 12 client libraries will be more involved, so in the short term we should pin to v11 so that stack image releases are unblocked. After that someone will need to investigate further as to how widespread invalid app configurations are and work on a rollout plan.

CC @heroku/dod-infra since they will need to be aware of this if wanting to upgrade to PG12 in the stack image in the future. For more info see the investigation notes here.

The libpq5 and libpq-dev packages have now been pinned to the PostgreSQL 11 release in #148.

Leaving this issue open to track resolving the PG12 compatibility issues at some point in the future.

@edmorley so these are clearly integer values, what do customers put there?:)

There were two variations seen (both for Rails apps, using database.yml), one like:

    connect_timeout: <%= ENV['SOME_VAR'] || 5 %># some comment

...the lack of whitespace before the comment means connect_timeout ends up having a value like '5# some comment' (if a space is added, the interpolation works fine).

And for the other:

    connect_timeout: 15s

What made this hard for the customer to debug was the Ruby pg gem swallowing the real error. I've filed ged/ruby-pg/issues/302 for this, but even if it's fixed it's going to be some time before customers update to the new version.

So there's some good news about the confusing Ruby pg gem error message (that makes debugging harder) - it turns out it's a bug in libpq, for which the author of the Ruby pg gem has kindly created a patch. Most importantly this means we can pick up the fix ourselves (once released), rather than needing to encourage/wait for customers to update their app's pg gem.

Whilst this doesn't make the libpq 12 changes any less breaking (for customers with invalid configs), it at least gives customers an actionable reason that they can self-serve without having to wait for support ticket assistance.

The libpq fix that results in an improved error message in ruby-pg (and likely others) landed in:
postgres/postgres@ed5109a

There hasn't yet been a new libpq release since then, from what I can tell.

@edmorley Thanks for the update here. We're walking through other breaking changes in 12.0 and we'll continue to track this. It will likely will be in the next patch release but we'll stay tuned as well

@edmorley can we have a separate stack images for pg12 specifically until we clean this up, it can take a bit and we would like customers to start using them?

@edmorley thanks, have not!