ClickHouse/metabase-clickhouse-driver

`SET ROLE` statement problem on clustered instances

mhkarimi1383 opened this issue · 11 comments

Describe the bug

SET statement is not working with clustered instances

See ClickHouse/ClickHouse#40924

Steps to reproduce

  1. Install the latest version of driver
  2. Bring up a clickhouse instance (clustered)
  3. Add the instance and try to run query on that

Expected behaviour

Use HTTP Params or disable impersonation on clustered instances

Error log

Configuration

Environment

  • metabase-clickhouse-driver version: 1.2.1
  • metabase-clickhouse-driver configuration: Default
  • Metabase version: v1.47
  • OS: Docker with K8s

ClickHouse server

  • ClickHouse Server version: 22.8
  • ClickHouse Server non-default settings, if any: Not Needed
  • CREATE TABLE statements for tables involved: Not Needed
  • Sample data for all these tables, use clickhouse-obfuscator if necessary

Any updates on this?

I was into fix that but I have a problem where I don't know what are the props of the first argument of (driver) in set-role-statement

I think adding a property in driver and user select that if ClickHouse instance is clustered or not, If yes this method can return empty string

https://github.com/ClickHouse/metabase-clickhouse-driver/blob/master/src/metabase/driver/clickhouse.clj#L203

@mhkarimi1383, this is clearly an overlook from me. I think this feature (:connection-impersonation) should be turned off for now, as SET ROLE and other session-level commands are currently limited to a particular node even with session usage, which essentially renders them useless with pooled connections as they are executed as separate statements and there is no guarantee that the consecutive requests will use the same connection and inherit what was previously set there.

Stay tuned.

@slvrtrn Thanks, Also adding clustered ClickHouse to the current docker compose setup for testing will be great 😃

@mhkarimi1383, I think it will introduce some complications as datasets are ENGINE Memory, and many of them are created in place, and we need to use wait_end_of_query and other parameters to ensure that the data is synchronized across the nodes, etc, etc...

But I agree that adding both an on-premise cluster of two nodes and ClickHouse Cloud to the mix is highly beneficial (that's how clickhouse-js CI is set up)

@slvrtrn If it's okay I will add clustered ClickHouse docker compose setup

@mhkarimi1383, you could try (we already have a sample setup available, for example, here with all the necessary configs).

Don't expect that it will work out of the box, though. Some additional changes will be needed depending on cluster/non-cluster setup. E.g., how dataset DDLs look like, what additional options to pass (at least wait_end_of_query=1 + insert_quorum=$NODE_COUNT for cluster), and many other things.

See how different table creation is for single-node/cluster/CH Cloud in clickhouse-js tests: https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/__tests__/fixtures/simple_table.ts#L23-L48

Client instance is also different across these:

  1. General settings (all queries) https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/__tests__/utils/client.ts#L36-L41
  2. DDLs settings: https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/__tests__/utils/client.ts#L114-L117

All of this needs to be ported to Clojure tests somehow, and, to add a bit more: many tests, especially from the Metabase bundle which we run every time, use datasets created in place using tx/dataset-definition calls. ON CLUSTER part should also be squeezed in there somehow.

@slvrtrn Yeah I see, but I will try to do it as much as I can 😁

@slvrtrn fixed, thank you :)

I am closing this, as the offending feature was disabled in the end, and unblocked the driver usage on clustered instances.

Proper implementation of the "connection impersonation" feature that correctly works with clusters will be tracked in this one: #219