postgresml/pgcat

Pgcat core dumps after retrying server connection a few times

Opened this issue · 18 comments

Pgcat disconnects from the server when idle, then core dumps after retrying server connection after a few times.

git close master branch on FreeBSD 13.2-STABLE
cargo build --release
cargo test (passes)
run pgcat and connect to local install of pgsql 14

Expected behavior
No connections drops or core dumps

  • OS: FreeBSD 13.2-STABLE stable/13-c79831b38 GENERIC amd64
  • Version: commit 0898461 (HEAD -> main, origin/main, origin/HEAD)
  • Rust : rustc 1.69.0 (84c898d65 2023-04-16) (built from a source tarball)
  • PostgreSQL: 14.7

Error log:

[root@staging-db-01 /tmp/pgcat]# RUST_LOG=info cargo run --release
    Finished release [optimized] target(s) in 0.37s
     Running `target/release/pgcat`
[2023-05-12T21:07:54.755899Z INFO  pgcat] Welcome to PgCat! Meow. (Version 1.0.2-alpha3)
[2023-05-12T21:07:54.758512Z INFO  pgcat] Running on 0.0.0.0:6432
[2023-05-12T21:07:54.758518Z INFO  pgcat::config] Ban time: 60s
[2023-05-12T21:07:54.758521Z INFO  pgcat::config] Idle client in transaction timeout: 0ms
[2023-05-12T21:07:54.758522Z INFO  pgcat::config] Worker threads: 5
[2023-05-12T21:07:54.758523Z INFO  pgcat::config] Healthcheck timeout: 1000ms
[2023-05-12T21:07:54.758524Z INFO  pgcat::config] Connection timeout: 5000ms
[2023-05-12T21:07:54.758526Z INFO  pgcat::config] Idle timeout: 30000ms
[2023-05-12T21:07:54.758527Z INFO  pgcat::config] Log client connections: false
[2023-05-12T21:07:54.758528Z INFO  pgcat::config] Log client disconnections: false
[2023-05-12T21:07:54.758529Z INFO  pgcat::config] Shutdown timeout: 60000ms
[2023-05-12T21:07:54.758530Z INFO  pgcat::config] Healthcheck delay: 30000ms
[2023-05-12T21:07:54.758531Z INFO  pgcat::config] Default max server lifetime: 86400000ms
[2023-05-12T21:07:54.758532Z INFO  pgcat::config] TLS support is disabled
[2023-05-12T21:07:54.758533Z INFO  pgcat::config] Server TLS enabled: false
[2023-05-12T21:07:54.758534Z INFO  pgcat::config] Server TLS certificate verification: false
[2023-05-12T21:07:54.758537Z INFO  pgcat::config] Plugins: interceptor: true, table_access: true, query_logger: true, prewarmer: true
[2023-05-12T21:07:54.758539Z INFO  pgcat::config] [pool: sharded_db] Maximum user connections: 30
[2023-05-12T21:07:54.758540Z INFO  pgcat::config] [pool: sharded_db] Default pool mode: transaction
[2023-05-12T21:07:54.758542Z INFO  pgcat::config] [pool: sharded_db] Load Balancing mode: Random
[2023-05-12T21:07:54.758543Z INFO  pgcat::config] [pool: sharded_db] Connection timeout: 3000ms
[2023-05-12T21:07:54.758544Z INFO  pgcat::config] [pool: sharded_db] Idle timeout: 40000ms
[2023-05-12T21:07:54.758545Z INFO  pgcat::config] [pool: sharded_db] Sharding function: pg_bigint_hash
[2023-05-12T21:07:54.758546Z INFO  pgcat::config] [pool: sharded_db] Primary reads: true
[2023-05-12T21:07:54.758548Z INFO  pgcat::config] [pool: sharded_db] Query router: true
[2023-05-12T21:07:54.758550Z INFO  pgcat::config] [pool: sharded_db] Number of shards: 3
[2023-05-12T21:07:54.758552Z INFO  pgcat::config] [pool: sharded_db] Number of users: 2
[2023-05-12T21:07:54.758554Z INFO  pgcat::config] [pool: sharded_db] Max server lifetime: default
[2023-05-12T21:07:54.758557Z INFO  pgcat::config] [pool: sharded_db] Plugins: interceptor: true, table_access: true, query_logger: true, prewarmer: true
[2023-05-12T21:07:54.758559Z INFO  pgcat::config] [pool: sharded_db][user: postgres] Pool size: 9
[2023-05-12T21:07:54.758563Z INFO  pgcat::config] [pool: sharded_db][user: postgres] Minimum pool size: 0
[2023-05-12T21:07:54.758565Z INFO  pgcat::config] [pool: sharded_db][user: postgres] Statement timeout: 0
[2023-05-12T21:07:54.758567Z INFO  pgcat::config] [pool: sharded_db][user: postgres] Pool mode: session
[2023-05-12T21:07:54.758569Z INFO  pgcat::config] [pool: sharded_db][user: postgres] Max server lifetime: default
[2023-05-12T21:07:54.758571Z INFO  pgcat::config] [pool: sharded_db][user: other_user] Pool size: 21
[2023-05-12T21:07:54.758574Z INFO  pgcat::config] [pool: sharded_db][user: other_user] Minimum pool size: 0
[2023-05-12T21:07:54.758575Z INFO  pgcat::config] [pool: sharded_db][user: other_user] Statement timeout: 15000
[2023-05-12T21:07:54.758576Z INFO  pgcat::config] [pool: sharded_db][user: other_user] Pool mode: transaction
[2023-05-12T21:07:54.758578Z INFO  pgcat::config] [pool: sharded_db][user: other_user] Max server lifetime: default
[2023-05-12T21:07:54.758579Z INFO  pgcat::config] [pool: simple_db] Maximum user connections: 5
[2023-05-12T21:07:54.758580Z INFO  pgcat::config] [pool: simple_db] Default pool mode: session
[2023-05-12T21:07:54.758582Z INFO  pgcat::config] [pool: simple_db] Load Balancing mode: Random
[2023-05-12T21:07:54.758583Z INFO  pgcat::config] [pool: simple_db] Connection timeout: 5000ms
[2023-05-12T21:07:54.758584Z INFO  pgcat::config] [pool: simple_db] Idle timeout: 30000ms
[2023-05-12T21:07:54.758585Z INFO  pgcat::config] [pool: simple_db] Sharding function: pg_bigint_hash
[2023-05-12T21:07:54.758587Z INFO  pgcat::config] [pool: simple_db] Primary reads: true
[2023-05-12T21:07:54.758588Z INFO  pgcat::config] [pool: simple_db] Query router: true
[2023-05-12T21:07:54.758589Z INFO  pgcat::config] [pool: simple_db] Number of shards: 1
[2023-05-12T21:07:54.758590Z INFO  pgcat::config] [pool: simple_db] Number of users: 1
[2023-05-12T21:07:54.758591Z INFO  pgcat::config] [pool: simple_db] Max server lifetime: default
[2023-05-12T21:07:54.758593Z INFO  pgcat::config] [pool: simple_db] Plugins: not configured
[2023-05-12T21:07:54.758594Z INFO  pgcat::config] [pool: simple_db][user: simple_user] Pool size: 5
[2023-05-12T21:07:54.758595Z INFO  pgcat::config] [pool: simple_db][user: simple_user] Minimum pool size: 3
[2023-05-12T21:07:54.758597Z INFO  pgcat::config] [pool: simple_db][user: simple_user] Statement timeout: 0
[2023-05-12T21:07:54.758598Z INFO  pgcat::config] [pool: simple_db][user: simple_user] Pool mode: session
[2023-05-12T21:07:54.758601Z INFO  pgcat::config] [pool: simple_db][user: simple_user] Max server lifetime: 60000ms
[2023-05-12T21:07:54.758614Z INFO  pgcat::prometheus] Exposing prometheus metrics on http://0.0.0.0:9930/metrics.
[2023-05-12T21:07:54.758714Z INFO  pgcat::pool] [pool: sharded_db][user: postgres] creating new pool
[2023-05-12T21:07:54.758776Z INFO  pgcat::pool] [pool: sharded_db][user: other_user] creating new pool
[2023-05-12T21:07:54.758841Z INFO  pgcat::pool] [pool: simple_db][user: simple_user] creating new pool
[2023-05-12T21:07:54.758885Z INFO  pgcat::pool] Creating a new server connection Address { id: 12, host: "127.0.0.1", port: 5432, shard: 0, database: "some_db", role: Primary, replica_number: 0, address_index: 0, username: "simple_user", pool_name: "simple_db", mirrors: [], stats: AddressStats { total_xact_count: 0, total_query_count: 0, total_received: 0, total_sent: 0, total_xact_time: 0, total_query_time: 0, total_wait_time: 0, total_errors: 0, old_total_xact_count: 0, old_total_query_count: 0, old_total_received: 0, old_total_sent: 0, old_total_xact_time: 0, old_total_query_time: 0, old_total_wait_time: 0, old_total_errors: 0, avg_query_count: 0, avg_query_time: 0, avg_recv: 0, avg_sent: 0, avg_errors: 0, avg_xact_time: 0, avg_xact_count: 0, avg_wait_time: 0, averages_updated: false } }

[2023-05-12T21:07:54.761810Z INFO  pgcat::plugins::prewarmer] [address: 127.0.0.1:5432][database: shard0][user: postgres] Prewarning with query: `SELECT pg_prewarm('pgbench_accounts')`

[2023-05-12T21:12:24.806849Z INFO  pgcat::server] Server connection closed Address { id: 13, host: "localhost", port: 5432, shard: 0, database: "some_db", role: Replica, replica_number: 0, address_index: 1, username: "simple_user", pool_name: "simple_db", mirrors: [], stats: AddressStats { total_xact_count: 0, total_query_count: 0, total_received: 688, total_sent: 576, total_xact_time: 0, total_query_time: 0, total_wait_time: 0, total_errors: 0, old_total_xact_count: 0, old_total_query_count: 0, old_total_received: 688, old_total_sent: 576, old_total_xact_time: 0, old_total_query_time: 0, old_total_wait_time: 0, old_total_errors: 0, avg_query_count: 0, avg_query_time: 0, avg_recv: 0, avg_sent: 0, avg_errors: 0, avg_xact_time: 0, avg_xact_count: 0, avg_wait_time: 0, averages_updated: false } }, session duration: 0d 00:01:00.010

[2023-05-12T21:12:24.807140Z INFO  pgcat::pool] Creating a new server connection Address { id: 12, host: "127.0.0.1", port: 5432, shard: 0, database: "some_db", role: Primary, replica_number: 0, address_index: 0, username: "simple_user", pool_name: "simple_db", mirrors: [], stats: AddressStats { total_xact_count: 0, total_query_count: 0, total_received: 731, total_sent: 612, total_xact_time: 0, total_query_time: 0, total_wait_time: 0, total_errors: 0, old_total_xact_count: 0, old_total_query_count: 0, old_total_received: 731, old_total_sent: 612, old_total_xact_time: 0, old_total_query_time: 0, old_total_wait_time: 0, old_total_errors: 0, avg_query_count: 0, avg_query_time: 0, avg_recv: 0, avg_sent: 0, avg_errors: 0, avg_xact_time: 0, avg_xact_count: 0, avg_wait_time: 0, averages_updated: false } }
Segmentation fault (core dumped)
levkk commented

Very interesting. Could you copy/paste your config? I've never tried it on FreeBSD. Rust is Tier 2 1 so bugs are possible, but a coredump should never happen. We are setting keep-alives on the socket (see configure_socket), maybe try disabling that?

Very interesting. Could you copy/paste your config? I've never tried it on FreeBSD. Rust is Tier 2 1 so bugs are possible, but a coredump should never happen. We are setting keep-alives on the socket (see configure_socket), maybe try disabling that?

I've generally had good luck with Rust apps on FreeBSD, as long as they compile (and they usually do) they work just fine. Note, this only happens when nothing is connected to pgcat, when something is then it just closes connections and re-opens them. Below is my config.

[root@staging-db-01 /tmp/pgcat]# cat pgcat.toml
#
# PgCat config example.
#

#
# General pooler settings
[general]
# What IP to run on, 0.0.0.0 means accessible from everywhere.
host = "0.0.0.0"

# Port to run on, same as PgBouncer used in this example.
port = 6432

# Whether to enable prometheus exporter or not.
enable_prometheus_exporter = true

# Port at which prometheus exporter listens on.
prometheus_exporter_port = 9930

# How long to wait before aborting a server connection (ms).
connect_timeout = 5000 # milliseconds

# How long an idle connection with a server is left open (ms).
idle_timeout = 30000 # milliseconds

# Max connection lifetime before it's closed, even if actively used.
server_lifetime = 86400000 # 24 hours

# How long a client is allowed to be idle while in a transaction (ms).
idle_client_in_transaction_timeout = 0 # milliseconds

# How much time to give the health check query to return with a result (ms).
healthcheck_timeout = 1000 # milliseconds

# How long to keep connection available for immediate re-use, without running a healthcheck query on it
healthcheck_delay = 30000 # milliseconds

# How much time to give clients during shutdown before forcibly killing client connections (ms).
shutdown_timeout = 60000 # milliseconds

# How long to ban a server if it fails a health check (seconds).
ban_time = 60 # seconds

# If we should log client connections
log_client_connections = false

# If we should log client disconnections
log_client_disconnections = false

# When set to true, PgCat reloads configs if it detects a change in the config file.
autoreload = 15000

# Number of worker threads the Runtime will use (4 by default).
worker_threads = 5

# Number of seconds of connection idleness to wait before sending a keepalive packet to the server.
tcp_keepalives_idle = 5
# Number of unacknowledged keepalive packets allowed before giving up and closing the connection.
tcp_keepalives_count = 5
# Number of seconds between keepalive packets.
tcp_keepalives_interval = 5

# Path to TLS Certificate file to use for TLS connections
# tls_certificate = ".circleci/server.cert"
# Path to TLS private key file to use for TLS connections
# tls_private_key = ".circleci/server.key"

# Enable/disable server TLS
server_tls = false

# Verify server certificate is completely authentic.
verify_server_certificate = false

# User name to access the virtual administrative database (pgbouncer or pgcat)
# Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
admin_username = "admin_user"
# Password to access the virtual administrative database
admin_password = "admin_pass"

# Default plugins that are configured on all pools.
[plugins]

# Prewarmer plugin that runs queries on server startup, before giving the connection
# to the client.
[plugins.prewarmer]
enabled = false
queries = [
  "SELECT pg_prewarm('pgbench_accounts')",
]

# Log all queries to stdout.
[plugins.query_logger]
enabled = false

# Block access to tables that Postgres does not allow us to control.
[plugins.table_access]
enabled = false
tables = [
  "pg_user",
  "pg_roles",
  "pg_database",
]

# Intercept user queries and give a fake reply.
[plugins.intercept]
enabled = true

[plugins.intercept.queries.0]

query = "select current_database() as a, current_schemas(false) as b"
schema = [
  ["a", "text"],
  ["b", "text"],
]
result = [
  ["${DATABASE}", "{public}"],
]

[plugins.intercept.queries.1]

query = "select current_database(), current_schema(), current_user"
schema = [
  ["current_database", "text"],
  ["current_schema", "text"],
  ["current_user", "text"],
]
result = [
  ["${DATABASE}", "public", "${USER}"],
]


# pool configs are structured as pool.<pool_name>
# the pool_name is what clients use as database name when connecting.
# For a pool named `sharded_db`, clients access that pool using connection string like
# `postgres://sharding_user:sharding_user@pgcat_host:pgcat_port/sharded_db`
[pools.sharded_db]
# Pool mode (see PgBouncer docs for more).
# `session` one server connection per connected client
# `transaction` one server connection per client transaction
pool_mode = "transaction"

# Load balancing mode
# `random` selects the server at random
# `loc` selects the server with the least outstanding busy conncetions
load_balancing_mode = "random"

# If the client doesn't specify, PgCat routes traffic to this role by default.
# `any` round-robin between primary and replicas,
# `replica` round-robin between replicas only without touching the primary,
# `primary` all queries go to the primary unless otherwise specified.
default_role = "any"

# If Query Parser is enabled, we'll attempt to parse
# every incoming query to determine if it's a read or a write.
# If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
# we'll direct it to the primary.
query_parser_enabled = true

# If the query parser is enabled and this setting is enabled, the primary will be part of the pool of databases used for
# load balancing of read queries. Otherwise, the primary will only be used for write
# queries. The primary can always be explicitly selected with our custom protocol.
primary_reads_enabled = true

# Allow sharding commands to be passed as statement comments instead of
# separate commands. If these are unset this functionality is disabled.
# sharding_key_regex = '/\* sharding_key: (\d+) \*/'
# shard_id_regex = '/\* shard_id: (\d+) \*/'
# regex_search_limit = 1000 # only look at the first 1000 characters of SQL statements

# So what if you wanted to implement a different hashing function,
# or you've already built one and you want this pooler to use it?
# Current options:
# `pg_bigint_hash`: PARTITION BY HASH (Postgres hashing function)
# `sha1`: A hashing function based on SHA1
sharding_function = "pg_bigint_hash"

# Query to be sent to servers to obtain the hash used for md5 authentication. The connection will be
# established using the database configured in the pool. This parameter is inherited by every pool
# and can be redefined in pool configuration.
# auth_query = "SELECT $1"

# User to be used for connecting to servers to obtain the hash used for md5 authentication by sending the query
# specified in `auth_query_user`. The connection will be established using the database configured in the pool.
# This parameter is inherited by every pool and can be redefined in pool configuration.
# auth_query_user = "sharding_user"

# Password to be used for connecting to servers to obtain the hash used for md5 authentication by sending the query
# specified in `auth_query_user`. The connection will be established using the database configured in the pool.
# This parameter is inherited by every pool and can be redefined in pool configuration.
# auth_query_password = "sharding_user"

# Automatically parse this from queries and route queries to the right shard!
# automatic_sharding_key = "data.id"

# Idle timeout can be overwritten in the pool
idle_timeout = 40000

# Connect timeout can be overwritten in the pool
connect_timeout = 3000

# When enabled, ip resolutions for server connections specified using hostnames will be cached
# and checked for changes every `dns_max_ttl` seconds. If a change in the host resolution is found
# old ip connections are closed (gracefully) and new connections will start using new ip.
# dns_cache_enabled = false

# Specifies how often (in seconds) cached ip addresses for servers are rechecked (see `dns_cache_enabled`).
# dns_max_ttl = 30

# Plugins can be configured on a pool-per-pool basis. This overrides the global plugins setting,
# so all plugins have to be configured here again.
[pool.sharded_db.plugins]

[pools.sharded_db.plugins.prewarmer]
enabled = true
queries = [
  "SELECT pg_prewarm('pgbench_accounts')",
]

[pools.sharded_db.plugins.query_logger]
enabled = false

[pools.sharded_db.plugins.table_access]
enabled = false
tables = [
  "pg_user",
  "pg_roles",
  "pg_database",
]

[pools.sharded_db.plugins.intercept]
enabled = true

[pools.sharded_db.plugins.intercept.queries.0]

query = "select current_database() as a, current_schemas(false) as b"
schema = [
  ["a", "text"],
  ["b", "text"],
]
result = [
  ["${DATABASE}", "{public}"],
]

[pools.sharded_db.plugins.intercept.queries.1]

query = "select current_database(), current_schema(), current_user"
schema = [
  ["current_database", "text"],
  ["current_schema", "text"],
  ["current_user", "text"],
]
result = [
  ["${DATABASE}", "public", "${USER}"],
]

# User configs are structured as pool.<pool_name>.users.<user_index>
# This section holds the credentials for users that may connect to this cluster
[pools.sharded_db.users.0]
# PostgreSQL username used to authenticate the user and connect to the server
# if `server_username` is not set.
username = "postgres"

# PostgreSQL password used to authenticate the user and connect to the server
# if `server_password` is not set.
password = "postgres"

pool_mode = "session"

# PostgreSQL username used to connect to the server.
# server_username = "another_user"

# PostgreSQL password used to connect to the server.
# server_password = "another_password"

# Maximum number of server connections that can be established for this user
# The maximum number of connection from a single Pgcat process to any database in the cluster
# is the sum of pool_size across all users.
pool_size = 9


# Maximum query duration. Dangerous, but protects against DBs that died in a non-obvious way.
# 0 means it is disabled.
statement_timeout = 0

[pools.sharded_db.users.1]
username = "other_user"
password = "other_user"
pool_size = 21
statement_timeout = 15000

# Shard configs are structured as pool.<pool_name>.shards.<shard_id>
# Each shard config contains a list of servers that make up the shard
# and the database name to use.
[pools.sharded_db.shards.0]
# Array of servers in the shard, each server entry is an array of `[host, port, role]`
servers = [["127.0.0.1", 5432, "primary"]]

# Array of mirrors for the shard, each mirror entry is an array of `[host, port, index of server in servers array]`
# Traffic hitting the server identified by the index will be sent to the mirror.
# mirrors = [["1.2.3.4", 5432, 0], ["1.2.3.4", 5432, 1]]

# Database name (e.g. "postgres")
database = "shard0"

[pools.sharded_db.shards.1]
servers = [["127.0.0.1", 5432, "primary"]]
database = "shard1"

[pools.sharded_db.shards.2]
servers = [["127.0.0.1", 5432, "primary" ]]
database = "shard2"


[pools.simple_db]
pool_mode = "session"
default_role = "primary"
query_parser_enabled = true
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"

[pools.simple_db.users.0]
username = "simple_user"
password = "simple_user"
pool_size = 5
min_pool_size = 3
server_lifetime = 60000
statement_timeout = 0

[pools.simple_db.shards.0]
servers = [
    [ "127.0.0.1", 5432, "primary" ]
]
database = "some_db"

Odd, after establishing a connection to PostgreSQL via pgcat, it no longer core dumps (even when nothing is connected), but still closes and re-opens connections (I also tried tcp_keepalives = 0, same thing, different timing). It's like that initial connection triggered something that fixed the issue.

levkk commented

A coredump though is not good. That points to a bug in the standard library and/or the compiler. Happy to debug with you further, although so far we don't have much to go on. Do you want to try to compile v1.0.0 [1] instead of main? We haven't really added anything since that would cause something like this, but still worth a shot.

[1] https://github.com/postgresml/pgcat/releases/tag/v1.0.0

Also, maybe we could take a look at the actual coredump file? It may contain some information about what's going on. I assume you compiled it with debug symbols? (cargo build, not cargo build --release). It's also worth a shot trying cargo build (debug build) and cargo build --release to compare, the compiler does a lot of optimizations that could lead to a problem.

Great. I compiled with --release, however, the bin itself, according to file has some debug symbols in it. Here is a link where you can obtain the core. Hopefully, it's of some use.

Thanks.

pgcat.core.gz

I got a new core dump with a debug build this time. It just took a little bit longer.

pgcat.debug.core.gz

FYI, I tried version 1.0.0. and it behaves differently. It disconnects from the pools once and never reports re-connecting. However, it works fine and does not seg fault.

[2023-05-13T01:21:10.366320Z INFO  pgcat::pool] Creating a new server connection Address { id: 11, host: "localhost", port: 5432, shard: 2, database: "shard2", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366393Z INFO  pgcat::pool] Creating a new server connection Address { id: 5, host: "localhost", port: 5432, shard: 2, database: "shard2", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366573Z INFO  pgcat::pool] Creating a new server connection Address { id: 6, host: "127.0.0.1", port: 5432, shard: 0, database: "shard0", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366543Z INFO  pgcat::stats] Events reporter started
[2023-05-13T01:21:10.366669Z INFO  pgcat::pool] Creating a new server connection Address { id: 7, host: "localhost", port: 5432, shard: 0, database: "shard0", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366725Z INFO  pgcat::pool] Creating a new server connection Address { id: 3, host: "localhost", port: 5432, shard: 1, database: "shard1", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366497Z INFO  pgcat::pool] Creating a new server connection Address { id: 2, host: "127.0.0.1", port: 5432, shard: 1, database: "shard1", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366787Z INFO  pgcat::pool] Creating a new server connection Address { id: 8, host: "127.0.0.1", port: 5432, shard: 1, database: "shard1", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366848Z INFO  pgcat::pool] Creating a new server connection Address { id: 4, host: "127.0.0.1", port: 5432, shard: 2, database: "shard2", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366880Z INFO  pgcat::pool] Creating a new server connection Address { id: 9, host: "localhost", port: 5432, shard: 1, database: "shard1", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.366648Z INFO  pgcat::pool] Creating a new server connection Address { id: 13, host: "localhost", port: 5432, shard: 0, database: "some_db", role: Replica, replica_number: 0, address_index: 1, username: "simple_user", pool_name: "simple_db", mirrors: [] }
[2023-05-13T01:21:10.366976Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "127.0.0.1", port: 5432, shard: 0, database: "shard0", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.367021Z INFO  pgcat::pool] Creating a new server connection Address { id: 10, host: "127.0.0.1", port: 5432, shard: 2, database: "shard2", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:21:10.367045Z INFO  pgcat::pool] Creating a new server connection Address { id: 12, host: "127.0.0.1", port: 5432, shard: 0, database: "some_db", role: Primary, replica_number: 0, address_index: 0, username: "simple_user", pool_name: "simple_db", mirrors: [] }
[2023-05-13T01:21:10.367072Z INFO  pgcat::pool] Creating a new server connection Address { id: 1, host: "localhost", port: 5432, shard: 0, database: "shard0", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }
[2023-05-13T01:22:10.368892Z INFO  pgcat::server] Server connection closed Address { id: 9, host: "localhost", port: 5432, shard: 1, database: "shard1", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.998
[2023-05-13T01:22:10.368965Z INFO  pgcat::server] Server connection closed Address { id: 10, host: "127.0.0.1", port: 5432, shard: 2, database: "shard2", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.999
[2023-05-13T01:22:10.369048Z INFO  pgcat::server] Server connection closed Address { id: 2, host: "127.0.0.1", port: 5432, shard: 1, database: "shard1", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.999
[2023-05-13T01:22:10.369156Z INFO  pgcat::server] Server connection closed Address { id: 11, host: "localhost", port: 5432, shard: 2, database: "shard2", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.999
[2023-05-13T01:22:10.369165Z INFO  pgcat::server] Server connection closed Address { id: 0, host: "127.0.0.1", port: 5432, shard: 0, database: "shard0", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.999
[2023-05-13T01:22:10.369241Z INFO  pgcat::server] Server connection closed Address { id: 1, host: "localhost", port: 5432, shard: 0, database: "shard0", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.998
[2023-05-13T01:22:10.369274Z INFO  pgcat::server] Server connection closed Address { id: 13, host: "localhost", port: 5432, shard: 0, database: "some_db", role: Replica, replica_number: 0, address_index: 1, username: "simple_user", pool_name: "simple_db", mirrors: [] }, session duration: 0d 00:00:59.998
[2023-05-13T01:22:10.369353Z INFO  pgcat::server] Server connection closed Address { id: 7, host: "localhost", port: 5432, shard: 0, database: "shard0", role: Replica, replica_number: 0, address_index: 1, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.998
[2023-05-13T01:22:10.369354Z INFO  pgcat::server] Server connection closed Address { id: 6, host: "127.0.0.1", port: 5432, shard: 0, database: "shard0", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:01:00.000
[2023-05-13T01:22:10.369371Z INFO  pgcat::server] Server connection closed Address { id: 8, host: "127.0.0.1", port: 5432, shard: 1, database: "shard1", role: Primary, replica_number: 0, address_index: 0, username: "other_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:01:00.000
[2023-05-13T01:22:10.369379Z INFO  pgcat::server] Server connection closed Address { id: 12, host: "127.0.0.1", port: 5432, shard: 0, database: "some_db", role: Primary, replica_number: 0, address_index: 0, username: "simple_user", pool_name: "simple_db", mirrors: [] }, session duration: 0d 00:00:59.999
[2023-05-13T01:22:10.369481Z INFO  pgcat::server] Server connection closed Address { id: 4, host: "127.0.0.1", port: 5432, shard: 2, database: "shard2", role: Primary, replica_number: 0, address_index: 0, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:01:00.000
[2023-05-13T01:22:10.369459Z INFO  pgcat::server] Server connection closed Address { id: 3, host: "localhost", port: 5432, shard: 1, database: "shard1", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.998
[2023-05-13T01:22:10.369512Z INFO  pgcat::server] Server connection closed Address { id: 5, host: "localhost", port: 5432, shard: 2, database: "shard2", role: Replica, replica_number: 0, address_index: 1, username: "sharding_user", pool_name: "sharded_db", mirrors: [] }, session duration: 0d 00:00:59.999

Any chance of looking into this? We are considering using this but need a stable product. Looks like something changed after version 1. Seeing as FreeBSD remains the reference platform for TCP/IP all should be ok there. Ty.

levkk commented

Hey Mike,

I unfortunately don't run FreeBSD, so I can't immediately help. We are stable on Linux, so it has to be something that we're doing that may be Linux-specific that's breaking FreeBSD. I know we are also stable on Mac (last time I checked anyway, would be good to check again).

Looking at the changelog between v1.0 and now, there is really nothing obvious that pops up that can mess with the TCP/IP stack. Maybe you could do a bisect and help us figure out which commit broke this on FreeBSD? Starting from v1.0, this may only take 2-3 compile & run jobs.

Thanks for your help! We would love to support FreeBSD officially as well, I've always been a fan of the demon.

Understood, it sounds like maybe some sort of Linuxism, Are there any particular commit bits you would like me to try?

levkk commented

I looked at them and I genuinely don't know. I would try git bisect (docs), it's a binary search on the commits history, so you should find the offending commit somewhere between v1.0 and main in O(log n) time.

I looked at them and I genuinely don't know. I would try git bisect (docs), it's a binary search on the commits history, so you should find the offending commit somewhere between v1.0 and main in O(log n) time.

Well, I just ran a fresh git pull and compile and I no longer get the core dump, no idea why, nothing changed on my end. I did notice though that the option autoreload = false no longer works (it wants an integer instead of a boolean...), so I just commented it out.

levkk commented

Yeah, this one is going to be tough to chase down. You have the core dump from the last crash, try opening it up in gdb to see where it actually crashed:

gdb target/debug/pgcat <coredump file>

I would do it on my end, but Linux binaries are different than FreeBSD, so gdb won't work with at coredump from FreeBSD for me, it has to be done on the same system I believe.

Yeah, this one is going to be tough to chase down. You have the core dump from the last crash, try opening it up in gdb to see where it actually crashed:

gdb target/debug/pgcat <coredump file>

I would do it on my end, but Linux binaries are different than FreeBSD, so gdb won't work with at coredump from FreeBSD for me, it has to be done on the same system I believe.

Sorry, no GNU gdb in FreeBSD and im far from a debugging expert :P I'll report if I see this happen again I guess...

Yeah, this one is going to be tough to chase down. You have the core dump from the last crash, try opening it up in gdb to see where it actually crashed:

gdb target/debug/pgcat <coredump file>

I would do it on my end, but Linux binaries are different than FreeBSD, so gdb won't work with at coredump from FreeBSD for me, it has to be done on the same system I believe.

BTW, I tried this with LLDB (FreeBSD uses Clang/LLVM not GNU toolset) but I have no idea what to feed it, something you can have me try perhaps?

(lldb) target create "target/debug/pgcat"
Current executable set to '/tmp/pgcat/target/debug/pgcat' (x86_64).
(lldb) settings set -- target.run-args  "/tmp/pgcat_main/pgcat.debug.core"
levkk commented

If it works like GDB, I'm guessing you can open it the same way:

lldb target/debug/pgcat <the core dump file that you have for the debug build>

If it works like GDB, I'm guessing you can open it the same way:

lldb target/debug/pgcat <the core dump file that you have for the debug build>

Yes, that works, but I have no idea how to use it.

levkk commented

That makes two of us :) Maybe try the instructions for GDB: https://stackoverflow.com/questions/5115613/core-dump-file-analysis, I'm guessing they are going to be identical for LLDB.