duckdb/duckdb-postgres

too many clients connected to PostgreSQL when using aggregate function

Closed this issue · 2 comments

What happens?

When I use the aggregate function (like count(), sum()) to access a PostgreSQL table, the postgres scanner produces a lot of PostgreSQL Clients connections. Other SQLs are OK.
The IOException is :duckdb.IOException: IO Error: Unable to connect to Postgres at user=xxx dbname=xxx: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already.

To Reproduce

I compiled the whole project with BUILD_PYTHON=1 make -j debug and load the extension at debug directory and install DuckDB from duckdb/tools/pythonpkg/setup.py.
(The max_connections parameter is 100 in $PGDATA/postgresql.conf.)
Here's my python code to reproduce the bug (sorry for the absolute path):

import duckdb
print(duckdb.__version__) # '0.7.2-dev327'
con = duckdb.connect(":memory:", config={"allow_unsigned_extensions" : True})
con.execute("LOAD '/home/xinning/duckdb_scanner_candebug/postgres_scanner/build/debug/extension/postgres_scanner/postgres_scanner.duckdb_extension' ")
con.execute("SELECT count(*) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # IO Error
con.execute("SELECT count(l.l_orderkey) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;")  # IO Error
con.execute("SELECT sum(l.l_orderkey) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # IO Error
con.execute("SELECT l.l_orderkey FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # OK

OS:

CentOS7

PostgreSQL Version:

15.2

DuckDB Version:

0.7.2-dev327

DuckDB Client:

Python

Full Name:

Xinning Zhang

Affiliation:

Tsinghua University

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

I've increased the max_connections to 10000 and fixed this bug. But I wonder if the PostgreSQL parrellel scan needs so many connections. Can I specify the clients parameter for parallel scans? I'm concerned that if I test the TPC-H SF100 in the future, the aggregate function will produce too many connections.

Thanks for the report! This should be fixed now in #111. There is also the pg_connection_limit parameter that can be used to control how many connections to Postgres are opened.