too many clients connected to PostgreSQL when using aggregate function
Closed this issue · 2 comments
What happens?
When I use the aggregate function (like count(), sum()) to access a PostgreSQL table, the postgres scanner produces a lot of PostgreSQL Clients connections. Other SQLs are OK.
The IOException is :duckdb.IOException: IO Error: Unable to connect to Postgres at user=xxx dbname=xxx: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already.
To Reproduce
I compiled the whole project with BUILD_PYTHON=1 make -j debug and load the extension at debug directory and install DuckDB from duckdb/tools/pythonpkg/setup.py.
(The max_connections parameter is 100 in $PGDATA/postgresql.conf.)
Here's my python code to reproduce the bug (sorry for the absolute path):
import duckdb
print(duckdb.__version__) # '0.7.2-dev327'
con = duckdb.connect(":memory:", config={"allow_unsigned_extensions" : True})
con.execute("LOAD '/home/xinning/duckdb_scanner_candebug/postgres_scanner/build/debug/extension/postgres_scanner/postgres_scanner.duckdb_extension' ")
con.execute("SELECT count(*) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # IO Error
con.execute("SELECT count(l.l_orderkey) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # IO Error
con.execute("SELECT sum(l.l_orderkey) FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # IO Error
con.execute("SELECT l.l_orderkey FROM POSTGRES_SCAN('user=xinning dbname=tpch_sf1', 'public', 'lineitem') as l where l.l_orderkey > 6666;") # OKOS:
CentOS7
PostgreSQL Version:
15.2
DuckDB Version:
0.7.2-dev327
DuckDB Client:
Python
Full Name:
Xinning Zhang
Affiliation:
Tsinghua University
Have you tried this on the latest master branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
I've increased the max_connections to 10000 and fixed this bug. But I wonder if the PostgreSQL parrellel scan needs so many connections. Can I specify the clients parameter for parallel scans? I'm concerned that if I test the TPC-H SF100 in the future, the aggregate function will produce too many connections.