ROADMAP 2024

Question

ROADMAP 2024

writinwaters opened this issue 10 months ago · 10 comments

Answer 1 · 2023-12-21T09:33:02.000Z

CI improvement: post logs of infinity when CI failure, use Ubuntu 20.04 as base of dev image.
Fuzz test of infinity.

Answer 2 · 2023-12-22T02:49:21.000Z

Secordary index on structured data type.
--->
Secondary index on structured data types.

Here is a mis-spelling error.

Answer 3 · 2023-12-22T02:54:04.000Z

Secondary

Fixed and thank you.

Answer 4 · 2024-01-08T14:31:11.000Z

compatibility testing

image	tag	refer
centos	7 8	https://hub.docker.com/_/centos/
ubuntu	20.04 22.04 24.04	https://hub.docker.com/_/ubuntu https://releases.ubuntu.com/
debian	8 9 10 11 12	https://hub.docker.com/_/debian https://www.debian.org/releases/
opensuse/leap	15.0 15.1 15.2 15.3 15.4 15.5	https://hub.docker.com/r/opensuse/leap
openeuler/openeuler	20.03 22.03	https://hub.docker.com/r/openeuler/openeuler
openanolis/anolisos	8.6 23	https://hub.docker.com/r/openanolis/anolisos
openkylin/openkylin	1.0	https://hub.docker.com/r/openkylin/openkylin

Answer 5 · 2024-01-22T16:14:34.000Z

I would like to contribute to this project, which issue would be a good start?

Answer 6 · 2024-01-23T07:17:02.000Z

@Kelvinyu1117
We do have a couple of issues that might work for contributors new to this project.

Add minmax information to blocks/segments in the current datastore. This information is primarily used for data filtering. (#448)
Implement a bloomfilter for the blocks/segments to enhance point queries. (#467)
Currently, query results are stored in memory in a columnar format. However, the client expects the results in Apache Arrow format. At the moment, the format conversion is executed on the Python client, but this worsens the performance, so we plan to convert the results to Apache Arrow format on the server side before sending them to the client.
There are several optimizer rules to implement, such as constant folding and simplification of arithmetic expressions, which are not yet on the roadmap. Feel free to work on them if interested.
We have additional complicated tasks not listed here. For instance, the current executor operates with one thread per CPU. We're considering using coroutine to enhance efficiency, but we don't have a solid solution yet. If you have experience in this area, you are very welcome to propose your solution.
We understand you're interested in contributing C++ code. However, if that's not the case, there's also unimplemented Python code, such as test cases and the Python SDK API.

Answer 7 · 2024-05-09T07:03:08.000Z

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database.
Apologies for commenting on the road map issue instead of creating a separate feature request.

Answer 8 · 2024-05-09T09:52:01.000Z

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database. Apologies for commenting on the road map issue instead of creating a separate feature request.

Nice, we will put this request into v0.2.0 release.

Answer 9 · 2024-05-10T06:42:46.000Z

@JinHai-CN Hi, I have experience in developing a database using Arrow. Is the issue that converting query results to Arrow format still active? I'd like to take it.

Answer 10 · 2024-05-10T07:12:05.000Z

@niebayes #1198, issue is created and we can discuss the requirement in that issue.

ROADMAP 2024

v0.5.0 (Planning)

Core:

Integration

Tools

v0.4.0

Core:

Integration

API

Tools

v0.3.0

Core:

v0.2.0

v0.1.0

Backlog

Core

Integration

Tools