databendlabs/databend

Databend Roadmap for 2025

Opened this issue ยท 22 comments

Databend Roadmap: 2024-2025 - Evolution to Multimodal Data+AI Warehouse

2024: Cloud-Native & Snowflake-like Experience

  • Cloud-Native: Fully cloud-native platform with a Snowflake-like experience.
  • 50% Cost Reduction: Achieved significant cost savings through advanced optimizations.

Experience Databend here: Databend


2025: On-Prem Solution to Replace Snowflake

  • Blazing analytics, fast search, geo insights, vector AI

Main Tasks for 2025

Task Status Comments
Dynamic Cluster Management Done Make query nodes more dynamic for better resource allocation.
Resource Group Management Done Enhance on-prem query control through better resource group management.
Disaster Recovery Done Implement multi-region failover solutions including point-in-time recovery and backup automation.
Stability Improvements In Progress Increase on-prem stability and fault tolerance for higher reliability.
Search Improvements In Progress VARIANT type support FULLTEXT index.
Spatial Indexing for Geometry Planned Implement spatial indexing capabilities for geometrical data.

Congratulations on what Databend has achieved in such a short time. Looking forward to 2024!

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

Any plan for SQL transaction and stored procedures?

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

besides, I think we could also support query queueing, warehouse automatica scaling based on pending queue and separate another coordinator component for dispatching physical plan to warehouse compute node.

This is a part of Enhancements to Concurrency and Scheduler.

What exactly is the Support Python Worksheet? Does it enable running Python in Databend?

The goal is to make the Hugging Face Model + Python + GPU( or CPU) + Data in Databend is possible.

all I want is to be able to read a delta table from a local path :)

How to understand Inference ๏ผŸWhich abilities does it refer to?

How to understand Inference ๏ผŸWhich abilities does it refer to?

Move the models(huggingface models) to the database, the database can load and run them.

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

Working on it: #14470

Thanks for making this available to everyone. I'm currently interested to anything that can deal with Geospatial stuff so adding Geometry support is very nice.

@keltia Databend uses H3 for geospatial operations. Is that what you are referring to? https://docs.databend.com/sql/sql-functions/geo-functions/

Would it be possible to restructure the code a little to help with creating a fully OSS compliant version of Databend?

I know the license info specifically calls out the ee directories but we have found some other files that are also covered by the Elastic license that seem to be part of the core functionality.

  • /src/meta/binaries/meta/ee-main.rs
  • /src/binaries/query/ee-main.rs

Many companies will not adopt a technology that doesn't use an approved OSS license, which Apache 2.0 is, but Elastic is not (open-source vs. source-available).

In an ideal world there would be a databend-oss repo and a databend-ee repo with the latter adding all of the Enterprise features and licensing on top of the OSS version.

I know the goal is to get enterprise customers to buy a license but I think you may be losing out on more community support and growth because devs restricted by corporate governance policies will never be able to even give Databend a test drive. Plus, like many grassroots database projects within a company, things start small and cheap then turn into mission critical systems requiring licensing. If we skip the small and cheap phase, the licensing never comes in those cases.

Agree, the mixing and matching of licenses both in docs conceptually and in the literal code makes it really hard to be able to use this database.

@BohuTANG
Hope to add support for spatial indexing of geometry in 2025 (like opengeospatial/geoparquet#13 ), as well as optimize too many snapshots problem for high-frequency time series data storage.

I wish databend in 2025 can support all of iceberg, paimon and hudi.

@BohuTANG Hope to add support for spatial indexing of geometry in 2025 (like opengeospatial/geoparquet#13 ), as well as optimize too many snapshots problem for high-frequency time series data storage.

Added to the roadmap. Thank you for the valuable suggestions!

I wish databend in 2025 can support all of iceberg, paimon and hudi.

Thanks! This goal has been added to the roadmap.

Dynamic tables are useful in many scenarios.

What about streaming computation support, like time window, watermark and stateful functions?

What about streaming computation support, like time window, watermark and stateful functions?

No plan yet. Could you give more about this requirements background?

What about streaming computation support, like time window, watermark and stateful functions?

No plan yet. Could you give more about this requirements background?

Not so much background, just a new direction of combining stream computinng and OLAP together.