An alternative implementation of JDBC Bridge for ClickHouse.
-
Named Query
Besides data source, you can define named queries in configuration as well.
Named data source - config/datasources/ch.json
{ "ch": { "type": "jdbc", "jdbcUrl": "jdbc:clickhouse://localhost/system", "dataSource": { "user": "default", "password": "", }, "columns": [ { "name": "instance_id", "type": "Int32", "value": "0", "nullable": false } ], "parameters": { "max_rows": 1000, "fetch_size": 200 } } }
Note:
type
,columns
andparameters
are optional.Named query - config/queries/test-query.json
{ "test-query": { "query": "select * from test_table", "columns": [ { "name": "column1", "type": "UInt32", "nullable": false }, { "name": "column2", "type": "String", "nullable": true }, { "name": "column3", "type": "Decimal", "nullable": true, "precision": 10, "scale": 2 } ], "parameters": { "max_rows": 10 } } }
Note: like named data source,
paramters
is not mandatory.columns
is optional too but it's highly recommended, as it prevents runtime type inferring which could be slow.Saved query - scripts/tests/test-query.sql
select * from test_table
With above configuration setup, you should be able to run the following queries in ClickHouse:
-- adhoc query select * from jdbc('ch', 'select * from test_table'); -- named query select * from jdbc('ch', 'test-query'); -- saved query select * from jdbc('ch', 'scripts/tests/test-query.sql');
-
Query Parameter
You can put query parameters like
max_rows
either in config or in your query.-- get the first 10 rows(based on above configuration) select * from jdbc('ch', 'test-query') -- replace all null values select * from jdbc('ch?null_as_default=true', 'test-query') -- get a specific row select * from jdbc('ch?max_rows=3&offset=2', 'select * from test_table order by column1 desc') -- retrieve meta data for defining a named query select * from jdbc('ch?debug=true', 'select * from test_table limit 1')
-
SRV Record Support
If you're using Consul or any other DNS server with SRV record support, you probably want to use service name instead of hostname/IP and port number combination when defining a datasource. To do that, assuming
mysql.service.dc1.consul
is the service name pointing to127.0.0.1:3306
, you can use any of below format instead ofjdbc:mysql://127.0.0.1:3306/test
:jdbc:mysql://{{ mysql.service.dc1.consul }}/test
jdbc:mysql://{{ host:mysql.service.dc1.consul }}/test
jdbc:mysql://{{ host:mysql.service.dc1.consul }}:{{ port:mysql.service.dc1.consul }}/test
-
Multiple Types of Data Sources
In addition to JDBC,
clickhouse-datasource-bridge
is extensible to support arbitrary data sources.-- run named query in pre-defined database select * from jdbc('ch', 'test-query') -- get list of jobs from Jenkins view select * from jdbc('jenkins:https://builds.apache.org/', 'jobs')
-
Based on Vert.x
Eclipse Vert.x is event driven and non blocking. It also makes
clickhouse-datasource-bridge
easy to config and scale.
# run with default configuration and JDBC drivers
docker run --rm -it -p 9019:9019 zhicwu/clickhouse-datasource-bridge
# run with custom configuration and JDBC drivers
docker run --rm -v `pwd`/config:/app/config -v `pwd`/drivers:/app/drivers -it -p 9019:9019 zhicwu/clickhouse-datasource-bridge
ClickHouse client version 19.11.8.46 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.11.8 revision 54423.
ch-server :) select * from jdbc('ch', 'select 1')
SELECT *
FROM jdbc('ch', 'select 1')
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.120 sec.
ch-server :) select * from jdbc('jdbc:ch', 'select 1')
SELECT *
FROM jdbc('jdbc:ch', 'select 1')
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.074 sec.
ch-server :) select * from jdbc('jdbc://ch', 'select 1')
SELECT *
FROM jdbc('jdbc://ch', 'select 1')
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.084 sec.
ch-server :) select * from jdbc('jdbc://ch?debug=true', 'select 1')
SELECT *
FROM jdbc('jdbc://ch?debug=true', 'select 1')
┌─datasource─┬─type─┬─query────┬─parameters─────────────────────────────────────┐
│ ch │ jdbc │ select 1 │ fetch_size=1000&max_rows=0&offset=0&position=0 │
└────────────┴──────┴──────────┴────────────────────────────────────────────────┘
1 rows in set. Elapsed: 0.052 sec.
ch-server :) select * from jdbc('jdbc:clickhouse://localhost/system?user=default&password=', 'select 1')
SELECT *
FROM jdbc('jdbc:clickhouse://localhost/system?user=default&password=', 'select 1')
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.156 sec.
Below is the directory structure used in docker image. By default, all configuration files must be put under config
directory and JDBC drivers under drivers
.
/app
|
|--- config
| |
| |--- datasources
| | |
| | |--- test-mysql.json
| |
| |--- queries
| | |
| | |--- test-query.json
| |
| |--- httpd.json
| |--- vertx.json
| |--- server.json
|
|--- drivers
| |
| |--- some-shaded-jdbc-driver.jar
|
|--- clickhouse-datasource-bridge.jar
Usually you don't need any of them but you can surely customize as needed.
File | Reloadable (Y/N) | Description |
---|---|---|
vertx.json | N | Vertx configuration, check here for more. |
server.json | N | Server configuration. |
httpd.json | N | Http server configuration, check here for more. |
datasources/*.json | Y | Named data sources. |
queries/*.json | Y | Named queries. |
-
vertx.json
This configuration file will be only read during server starting and it is NOT reloadable afterwards.
{ "maxWorkerExecuteTime": 300, "maxWorkerExecuteTimeUnit": "SECONDS", "workerPoolSize": 10 }
-
server.json
This configuration file will be only read during server starting and it is NOT reloadable afterwards.
{ "serverPort": 8080, "requestTimeout": 5000, "queryTimeout": 60000 }
-
datasources/named-data-source.json
-
datasources/named-query.json
Issue | Workaround | Remark |
---|---|---|
query timed out | * server-side: 1) increase timeout in datasource configuration; 2) increase max_execution_timeout in server.json ; * client-side: 1) increase clickhouse-jdbc-driver timeout |
- Consul support - retrieve datasource and/or query from Consul KV store...
- Error handling - abort worker thread once client closed stream(exception from stream.write?)
- Special datasource like config - 'select * from jdbc('config', '', 'show datasources')'
- Special run-time parameter to save query and/or generate datasources(select * from jdbc('ds?save_query=xxx', '', 'select 1'))
- Strict mode(turn on by default)
- Provider for CodeQL
- More examples (docker-compose.yml, swarm and k8s, with/without Vert.x clustering)
- Reduce dependencies so that we can go Native - https://vertx.io/blog/eclipse-vert-x-goes-native/
- Exclude unnecessary dependencies from Vertx lib
- Simple LRU cache(better in JDK) to replace Caffeine
- Avoid reflection - less configuration required to build native image
- Reduce memory usage if we stick with JVM - https://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used
- Parallelized query
- More data sources(Cassandra, DGraph, InfluxDB and Prometheus)...