Allow support for either multiple queries or more direct control of the agent to ensure queries are sent via the same socket session

Question

Allow support for either multiple queries or more direct control of the agent to ensure queries are sent via the same socket session

brian-mann opened this issue 6 months ago · 4 comments

Use case

We need the ability to issue multiple requests using the same socket connection in order to utilize creating temporary tables and querying them.

The clickhouse cli binary allows for the --multiquery flag but there is no way to issue multiple queries (to my knowledge) with the nodejs client.

For instance it does not seem easily possible to do this:

create temporary table foo as select 1;
select * from foo;

The only way we've found is to make 2 round trips, but to do so, we have to ensure we're using the same socket connection. An alternative approach would be to more directly control the underlying node.js agent.

There was a PR opened here (#275) enabling passing in your own agent instance, but the PR was closed.

Describe the solution you'd like

We're also aware of the feature request to materialize CTE's (PR opened here: ClickHouse/ClickHouse#61086) - which would also solve for our use case, but until that is implemented - either having multiquery support OR making it easier to control the request agent is the only solution.

More direct control of the agent would at least give us the ability to use the same connection pool, but ensure that the socket connection is not free'd up until the dependent queries have finished. Ideally though, multiquery or materialized CTE's would be a superior solution.

Describe the alternatives you've considered

As a workaround, we can effectively recreate this by instantiating a new client and setting max_open_connections: 1 so that all queries run over the same socket, but that is not only very inefficient because the socket has to be reconnected and closed each time, but it bypasses the natural connection pool.

Additional context

Answer 1 · 2024-06-17T12:43:59.000Z

but ensure that the socket connection is not free'd up until the dependent queries have finished.

I don't think it's safe to rely on this assumption in distributed enviroment with network hiccups.

More direct control of the agent would at least give us the ability to use the same connection pool, but ensure that the socket connection is not free'd up until the dependent queries have finished.

How do you deploy ClickHouse? Is there an option of using a session mechanism? from the temporary table docs

primary use case where temporary tables can be useful is for querying or joining small external datasets during a single session.

Answer 2 · 2024-06-17T16:09:16.000Z

I don't think it's safe to rely on this assumption in distributed enviroment with network hiccups.

This would be only used before issuing one specific type of query, not as a general pattern. It's really no different than writing multiple queries via the CLI or any visual IDE-based SQL editor (like DataGrip, etc).

How do you deploy ClickHouse?

clickhouse cloud

Answer 3 · 2024-06-18T15:33:04.000Z

@brian-mann, it will be possible to override the HTTP(s) agent after #284.

Regarding the multi-query support, it entirely depends on ClickHouse/ClickHouse#61608. There are many nuances with HTTP and load balancers, so it can be tricky to have this working reliably for all scenarios.

Answer 4 · 2024-06-19T13:47:15.000Z

@brian-mann, custom HTTP(s) agent is now supported as of 1.2.0, see also the relevant docs with examples.

Multi-query support won't be implemented until ClickHouse adds this to the HTTP protocol (see ClickHouse/ClickHouse#61608).