/fuse-query

FuseQuery is a Cloud-Native SQL Query Engine at scale

Primary LanguageRustGNU Affero General Public License v3.0AGPL-3.0

FuseQuery

Github Actions Status Github Actions Status Github Actions Status codecov.io Platform License

FuseQuery is a Cloud Distributed SQL Query Engine at scale.

Cloud-Native and Distributed ClickHouse from scratch in Rust.

Give thanks to ClickHouse and Arrow.

Features

  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • True Separation of Storage and Compute

Architecture

DataFuse Architecture

Crates

Crate Description Status
distributed Distributed scheduler and executor for planner WIP
optimizers Optimizer for Distributed&Local plan WIP
datablocks Vectorized data processing unit WIP
datastreams Async streaming iterators WIP
datasources Interface to the datasource(system.numbers for performance/Fuse-Store) WIP
executors Executor(EXPLAIN/SELECT) for the Pipeline WIP
functions Scalar and Aggregation Functions WIP
processors Dataflow Streaming Processor WIP
planners Distributed&Local planners for building processor pipelines WIP
servers Server handler(MySQL/HTTP) MySQL
transforms Data Stream Transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) WIP

Status

SQL Support

  • Projection
  • Filter
  • Limit
  • Aggregate
  • Functions
  • Filter Push-Down
  • Projection Push-Down (TODO)
  • Distributed Query (WIP)
  • Sorting (TODO)
  • Joins (TODO)
  • SubQueries (TODO)

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 10,000,000,000 (10 Billion)
  • Hardware: 8vCPUx16G Cloud Instance
  • Rust: rustc 1.50.0-nightly (f76ecd066 2020-12-15)
  • Build with Link-time Optimization and Using CPU Specific Instructions
Query FuseQuery (v0.1) ClickHouse (v19.17.6)
SELECT avg(number) FROM system.numbers_mt (1.32 s.) ×1.29 (1.70 s)
5.90 billion rows/s., 47.16 GB/s
SELECT sum(number) FROM system.numbers_mt (1.35 s.) ×0.99 (1.34 s)
7.48 billion rows/s., 59.80 GB/s
SELECT min(number) FROM system.numbers_mt (1.34 s.) ×1.17 (1.57 s.)
6.36 billion rows/s., 50.89 GB/s
SELECT max(number) FROM system.numbers_mt (1.32 s.) ×1.77 (2.33 s.)
4.34 billion rows/s., 34.74 GB/s
SELECT max(number+1) FROM system.numbers_mt (3.77 s.) ×0.87 (3.29 s.)
3.04 billion rows/s., 24.31 GB/s
SELECT count(number) FROM system.numbers_mt (1.31 s.) ×0.51 (0.67 s.)
15.00 billion rows/s., 119.99 GB/s
SELECT sum(number+number+number) FROM numbers_mt (4.05 s.) ×1.22 (4.95 s.)
2.02 billion rows/s., 16.17 GB/s
SELECT sum(number) / count(number) FROM system.numbers_mt (1.32 s.) ×0.97 (1.28 s.)
7.84 billion rows/s., 62.73 GB/s
SELECT sum(number) / count(number), max(number), min(number) FROM system.numbers_mt (1.76 s.) ×2.29 (4.03 s.)
2.33 billion rows/s., 18.61 GB/s

Note:

  • ClickHouse system.numbers_mt is 8-way parallelism processing
  • FuseQuery system.numbers_mt is 8-way parallelism processing

How to Run?

Fuse-Query Server

Run from source

$ make run

12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307

or Run with docker(Recommended):

$ docker pull datafusedev/fuse-query
...

$ docker run --init --rm -p 3307:3307 datafusedev/fuse-query
05:12:36 [ INFO] Options { log_level: "debug", num_cpus: 6, mysql_handler_port: 3307 }
05:12:36 [ INFO] Fuse-Query Cloud Compute Starts...
05:12:36 [ INFO] Usage: mysql -h127.0.0.1 -P3307

Query with MySQL client

Connect
$ mysql -h127.0.0.1 -P3307
Explain
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| └─ Limit: 3
  └─ Projection: (number + 1) as c1:UInt64, (number / 2) as c2:UInt64
    └─ Filter: (((c1 + c2) + 1) < 100)
      └─ ReadDataSource: scan parts [8](Read from system.numbers_mt table)                                                                                                                   |
| 
  └─ LimitTransform × 1 processor
    └─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
      └─ LimitTransform × 8 processors
        └─ ProjectionTransform × 8 processors
          └─ FilterTransform × 8 processors
            └─ SourceTransform × 8 processors                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Select
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    0 |
|    2 |    0 |
|    3 |    1 |
+------+------+
3 rows in set (0.06 sec)

How to Test?

$ make test

Roadmap

  • 0.1 support aggregation select
  • 0.2 support distributed query (WIP)
  • 0.3 support group by, order by
  • 0.4 support join
  • 0.5 support sub queries
  • 0.6 support TPC-H benchmark