sfu-db/connector-x

cannot send arbitrary number of queries in a single call to `read_sql`

ypsah opened this issue · 1 comments

What language are you using?

Python

What version are you using?

0.3.2

What database are you using?

PostgreSQL (timescale) + pgbouncer

What dataframe are you using?

Arrow2 (should not matter)

Can you describe your bug?

My pgbouncer instance is configured to use a connection pool of 50 slots and to allow 500 client connections per user/database.

When using the session pooling mode of pgbouncer (1-1 client connection/server connection for the duration of a session). I cannot send more than 50 queries per call to cx.read_sql.

With the other pooling modes (transaction/statement, which only are an option since pgbouncer 1.21), that number climbs to 500.

Sending even one more query above these thresholds will result in a python exception being raised (pretty reliable except for session pooling which sometimes appears to just hang).

What are the steps to reproduce the behavior?

Run timescale + pgbouncer with either:

  • pool_mode set to session, and default_pool_size set to N (e.g. 50);
  • pgbouncer version 1.21+, max_prepared_statement set to some strictly positive value (e.g. 100), pool_mode set to transaction or statement, and max_client_conn set to N (e.g. 500).

Run the following snippet (replacing uri with the connection string for your pgbouncer instance and N with the value you configured pgbouncer with):

import connector as cx

cx.read_sql(uri, ["SELECT 1"] * N)  # OK
cx.read_sql(uri, ["SELECT 1"] * (N + 1))  # KO
Database setup if the error only happens on specific data or data type

N/A

Example query / code
import connector as cx

cx.read_sql(uri, ["SELECT 1"] * (N + 1))

What is the error?

In session pooling mode, either a hang, or an exception that mentions having hit query_wait_timeout (pgbouncer setting).

In the other pooling modes, an exception that mentions r2d2 failed to acquire enough connection slots.

It's quite easy to work around this limitation by using smaller query batches, so it's fine by me if connectorx keeps its current behaviour.

I'm merely surprised that connectorx uses r2d2 for connection pooling but actually requests one connection per partition. And maybe even more importantly, connectorx does not seem to release connections as queries complete (within the scope of a single call to read_sql).

Could you please comment on this?