/fuse-query

FuseQuery is a Distributed SQL Query Engine at scale

Primary LanguageRustGNU Affero General Public License v3.0AGPL-3.0

FuseQuery

Github Actions Status Github Actions Status Github Actions Status codecov.io Platform License

FuseQuery is a Cloud Distributed SQL Query Engine at scale.

Distributed ClickHouse from scratch in Rust.

Give thanks to ClickHouse and Arrow.

Features

  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • True Separation of Storage and Compute

Architecture

DataFuse Architecture

Crates

Crate Description Status
distributed Distributed scheduler and executor for planner WIP
optimizers Optimizer for Distributed&Local plan WIP
datablocks Vectorized data processing unit WIP
datastreams Async streaming iterators WIP
datasources Interface to the datasource(system.numbers for performance/Fuse-Store) WIP
executors Executor(EXPLAIN/SELECT) for the Pipeline WIP
functions Scalar and Aggregation Functions WIP
processors Dataflow Streaming Processor WIP
planners Distributed&Local planners for building processor pipelines WIP
servers Server handler(MySQL/HTTP) MySQL
transforms Data Stream Transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) WIP

Status

SQL Support

  • Projection
  • Filter
  • Limit
  • Aggregate
  • Functions
  • Filter Push-Down
  • Projection Push-Down
  • Distributed Query
  • Sorting
  • Joins
  • SubQueries

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 10,000,000,000 (10 Billion)
  • Hardware: 8vCPUx16G Cloud Instance
  • Rust: rustc 1.50.0-nightly (f76ecd066 2020-12-15)
Query FuseQuery Cost ClickHouse Cost
SELECT avg(number) FROM system.numbers_mt [2.02s] [1.70s], 5.90 billion rows/s., 47.16 GB/s
SELECT sum(number) FROM system.numbers_mt [1.77s] [1.34s], 7.48 billion rows/s., 59.80 GB/s
SELECT max(number) FROM system.numbers_mt [2.83s] [2.33s], 4.34 billion rows/s., 34.74 GB/s
SELECT max(number+1) FROM system.numbers_mt [6.13s] [3.29s], 3.04 billion rows/s., 24.31 GB/s
SELECT count(number) FROM system.numbers_mt [1.55s] [0.67s], 15.00 billion rows/s., 119.99 GB/s
SELECT sum(number) / count(number) FROM system.numbers_mt [2.04s] [1.28s], 7.84 billion rows/s., 62.73 GB/s
SELECT sum(number) / count(number), max(number), min(number) FROM system.numbers_mt [6.40s] [4.30s], 2.33 billion rows/s., 18.61 GB/s

Note:

  • ClickHouse system.numbers_mt is 8-way parallelism processing
  • FuseQuery system.numbers_mt is 8-way parallelism processing

How to Run?

Fuse-Query Server

Run from source

$ make run

12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307

or Run with docker(Recommended):

$ docker pull datafusedev/fuse-query
...

$ docker run --init --rm -p 3307:3307 datafusedev/fuse-query
05:12:36 [ INFO] Options { log_level: "debug", num_cpus: 6, mysql_handler_port: 3307 }
05:12:36 [ INFO] Fuse-Query Cloud Compute Starts...
05:12:36 [ INFO] Usage: mysql -h127.0.0.1 -P3307

Query with MySQL client

Connect
$ mysql -h127.0.0.1 -P3307
Explain
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| └─ Limit: 3
  └─ Projection: (number + 1) as c1:UInt64, (number / 2) as c2:UInt64
    └─ Filter: (((c1 + c2) + 1) < 100)
      └─ ReadDataSource: scan parts [8](Read from system.numbers_mt table)                                                                                                                   |
| 
  └─ LimitTransform × 1 processor
    └─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
      └─ LimitTransform × 8 processors
        └─ ProjectionTransform × 8 processors
          └─ FilterTransform × 8 processors
            └─ SourceTransform × 8 processors                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Select
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    0 |
|    2 |    0 |
|    3 |    1 |
+------+------+
3 rows in set (0.06 sec)

How to Test?

$ make test