databendlabs/databend

Roadmap 2023

BohuTANG opened this issue · 9 comments

After a full year of research and development in 2022, the functionality and stability of Databend were significantly enhanced, and several users began using it in production. Databend has helped them greatly reduce costs and operational complexity issues.

This is Databend Roadmap in 2023 (discussion).

See also:

Main tasks

v1.3

v1.2 (Prepare for release on May 15th)

v1.1 (Prepare for release on April 5th)

v1.0 (Prepare for release on March 5th)

Features

Task Status Comments
Update#9261 DONE need optimized(release in v1.0)
Privileges DONE
Alter table DONE high-priority(release in v1.0 )
Window function#6342 DONE
Lambda function and high-order functions DONE
Materialized view Aggregating index DONE
Support SET_VAR hints#8833 DONE
Parquet reader DONE
DataFrame DONE
Data Sharing(community version) DONE
Concurrent query enhance IN PROGRESS
Distributed COPY#8594 DONE
Support Decimal data type#2931 DONE high-priority(release in v1.0 )
Add Column-Level dynamic data masking support PLAN

Improvements

Task Status Comments
New expression#9411 DONE
Error message PLAN

Planner

Task Status Comments
Scalar expression normalization DONE
Column constraint framework DONE
Functional dependency framework#7438 DONE
Join reorder DONE
CBO DONE high-priority(release in v1.0)
Support TPC-DS DONE
Support optimization tracing PLAN Easy to debug/study.

Cache

Task Status Comments
Unified cache layer DONE
Meta data cache DONE
Index data cache DONE
Block data cache DONE high-priority(release in v1.0 )

Data Storage

Task Status Comments
Fuse engine re-clustering DONE high-priority(release in v1.1)
Fuse engine orphan data cleanup DONE high-priority(release in v1.0)

Distributed Query Execution

Task Status Comments
Visualized profiling IN PROGRESS
Aggregation spilling DONE high-priority(release in v1.1)

Resource Quota

Task Status Comments
Session-level quota control (CPU/Memory) DONE

Schema-Less Search

Task Status Comments
JSON indexing DONE high-priority
Fulltext index#3915 IN PROGRESS high-priority
Array functions#7931 DONE high-priority
Faiss index#9699 PLAN

LakeHouse

Task Status Comments
Apache Hive DONE
Apache Iceberg DONE
Delta Lake PLAN
Querying external storage(Parquet) DONE

Integrations

Task Status Comments
Dbt integration DONE
Airbyte integration DONE
Datadog Vector integrate with Rust-driver DONE
Datax integrate with Java-driver DONE
CDC with Flink DONE
CDC with Kafka DONE

Meta

Task Status Comments
Jepsen test DONE
Store membership in raft DONE
Nonblocking snapshot building DONE
Snapshot file format impl DONE
Upgrade on-disk store format DONE

Testing

Task Status Comments
SQLlogic Test DONE Supports more test cases
SQLancer Test DONE Supports more type and more cases
Fuzzer Test IN PROGRESS

Releases

any plan about improving concurrency capabilities? so developers can depend on databend to make some data exploring platforms (like google analystics?) on the web.

any plan about tuning the metasrv's memory usage? I've got a OOM last week, IMHO it can store most the data in the disk?

any plan about improving concurrency capabilities? so developers can depend on databend to make some data exploring platforms (like google analystics?) on the web.

Added: Concurrent query enhance

any plan about tuning the metasrv's memory usage? I've got a OOM last week, IMHO it can store most the data in the disk?

@drmingdrmer will fill the meta section, I think he will do it.

Any plan to support decimal data type? This is essential If we want to use databend in financial related fields. Will we see it in the first half of the year?

Any plan to support decimal data type? This is essential If we want to use databend in financial related fields. Will we see it in the first half of the year?

Added to the main task, thanks.

will fault tolerance on query processing be planned in 2023?

likewise I have some spot instances, the cluster may handles a shutdowned instance gracefully and not affect the running queries.

will fault tolerance on query processing be planned in 2023?

Will do but hard to do, so the priority is low.

likewise I have some spot instances, the cluster may handles a shutdowned instance gracefully and not affect the running queries.

Please file an issue for that.

Is there a plan for when the vector index feature will be added? It is part of #10689 but doesn't seem to have an associated ticket.