databendlabs/databend

Databend Roadmap for 2024 (Discussion)

BohuTANG opened this issue · 14 comments

Databend Roadmap for 2024 (Discussion)

Explore our ongoing journey and future plans for Databend. Join the discussion and contribute your ideas!

2024: Compute Where Data Lives: Swift, Smart, Seamless.

Review of 2023

In 2023, Databend scaled significantly.

The largest single table in Databend managed to handle hundreds of thousands of segments, several ten million blocks, tens of trillions of records, encompassing 7PB of raw data and over 300TB of index data.

Main Tasks for 2024

Task Status Comments
Concurrency and Scheduler In Progress Aiming for faster, more efficient task handling and improved system responsiveness.
GEOMETRY Data Type In Progress
TPC-DS Performance In Progress Continuously optimizing for better performance benchmarks.
Full-Text Indexes Done
Multi-Statement Transactions Done
Stored Procedures(Python) In Progress Adding Python support for versatile data analysis alongside SQL.
Storage + Compute + Inference Not Specified Creating a cohesive data platform for AI and cloud computing, provisioning CPU & GPU resources.

Previous Roadmaps for Reference:

Congratulations on what Databend has achieved in such a short time. Looking forward to 2024!

Xuanwo commented

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

Any plan for SQL transaction and stored procedures?

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

This is a part of Enhancements to Concurrency and Scheduler.

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

The goal is to make the Hugging Face Model + Python + GPU( or CPU) + Data in Databend is possible.

all I want is to be able to read a delta table from a local path :)

How to understand Inference ?Which abilities does it refer to?

How to understand Inference ?Which abilities does it refer to?

Move the models(huggingface models) to the database, the database can load and run them.

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

Working on it: #14470

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

@keltia Databend uses H3 for geospatial operations. Is that what you are referring to? https://docs.databend.com/sql/sql-functions/geo-functions/

Would it be possible to restructure the code a little to help with creating a fully OSS compliant version of Databend?

I know the license info specifically calls out the ee directories but we have found some other files that are also covered by the Elastic license that seem to be part of the core functionality.

  • /src/meta/binaries/meta/ee-main.rs
  • /src/binaries/query/ee-main.rs

Many companies will not adopt a technology that doesn't use an approved OSS license, which Apache 2.0 is, but Elastic is not (open-source vs. source-available).

In an ideal world there would be a databend-oss repo and a databend-ee repo with the latter adding all of the Enterprise features and licensing on top of the OSS version.

I know the goal is to get enterprise customers to buy a license but I think you may be losing out on more community support and growth because devs restricted by corporate governance policies will never be able to even give Databend a test drive. Plus, like many grassroots database projects within a company, things start small and cheap then turn into mission critical systems requiring licensing. If we skip the small and cheap phase, the licensing never comes in those cases.

Agree, the mixing and matching of licenses both in docs conceptually and in the literal code makes it really hard to be able to use this database.