postgresml/pgcat

Servers need to be available for a successful configuration reloading

Closed this issue · 3 comments

levkk commented

Describe the bug
ConnectionPool requires all servers to be reachable to reload the config. Specifically, it fetches server information and passes it to the client statically instead of dynamically when the connection is made. This ensures that the servers are correctly configured before a configuration is made live, but it also blocks configuration reload for every server if only one server is down.

To Reproduce
Steps to reproduce the behavior:

  1. Place an incorrect IP/port into the server config.
  2. Try to reload config with kill -HUP $(pgrep pgcat)
  3. It should be blocked on reloading config, although the pooler will remain online with existing config.

Expected behavior
Good question. I think it should skip the broken server and issue an error to the log.

Desktop (please complete the following information):

  • OS: Ubuntu
  • Version: latest main

Additional context
This can be thought of both as a feature and a bug. It's conceivable that servers that are not reachable should not be allowed into the system, but intermittent issues happen and ideally the pooler shouldn't rely on all servers to be available to do its job, e.g. in multi-tenant or heavily sharded/replicated clusters, where the failure of one shard/replica shouldn't impact the continuous ability of the pooler to do its job.

I believe this is also true for starting pgcat. I noticed if postgresql is offline, pgcat will not boot because it cannot validate. Imagine you just had a postgresql database blow up and while someone is working on fixing it, you push a configuration change to pgcat which needs a reload or a restart (depending on how you deploy it). Neither can happen because your teammate is still working on getting the db server to boot. I don't think it makes sense to make pgcat wait for postgres to be up before starting.

This issue should solve both of those problems.

As an impl idea: The connection pools cached server info params should be read-through. If they are not currently stored, when a client connects to pgcat, we should attempt to fetch server info params on-demand. If we are unable to do so, reject the client connection. If we are, accept the connection and cache the params for the next client who connects. We should refresh this cache (make an attempt to) when we validate a new config on reload or boot. Thoughts?

levkk commented

That would work!

levkk commented

Should be fixed. Feel free to reopen if any issues arise.