Tracking issue for improve TinySQL as a learning-friendly mini distributed relational database

Question

Tracking issue for improve TinySQL as a learning-friendly mini distributed relational database

Opened this issue 4 years ago · 2 comments

It is corresponding to the effort towards Talent Plan v3.0.

According to user feedback and my investigation, I found that TinySQL has serious issues. They make it a departure from the learning-friendly mini distributed relational database:

Not mini. The TinySQL has more than 100,000 lines of code. It is almost a copy of TiDB, and then part of the code is deleted. It contains a lot of irrelevant code and design.
Documents unfriendly. It almost only briefly explained the relevant knowledge topics and did not explain the project structure.
Poor course design. The topics explained in each lab are very large, but the content that needs to be implemented is only a small part.
Poor comments. They can't help understand the code.

In order to solve the above problems, I will redesign and implement TinySQL. The main improvements in the plan are as follows:

Redesign the course.
- Divide TinySQL into five stages. Each stage has a clear target and iconic function, and the subsequent stages are based on the previous stage, which is the progression of the previous stage.
- At one stage, we hope that TinySQL is simple enough. As more stages are completed, we will add necessary functions to TinySQL to make it truly a distributed relational database.
- I put the specific stage division at the end of this issue.
Adopt incremental framework mode.
- Initially, the course framework has no content, and every stage/substage will introduce the framework code that must be required for that stage/substage. Its purpose is to ensure the conciseness of the framework code and clearly show the content introduced at each stage/substage.

Optimize documentation and comments

The documentation layout

## Stage

### Introduction
#### Objectives
#### Materials

### Topic 1
#### Knowledge topic
#### Related code

### Exercises
### References

The following is stage design:

Stage 1: read-only relational database
- Target: the ability to read data using KV engine API
- Iconic function: the ability to handle simple SELECT statements
- Knowledge topic:
  - parser
  - data mapping from the relational model to KV
  - generating operator
Stage 2: insert and update
- Target: the ability to write data using KV engine API
- Iconic function: the ability to handle simple INSERT/UPDATE statements
- Knowledge topic:
  - volcano model
Stage 3: DDL
- Target: the ability to process DDL online
- Iconic function: the ability to process CREATE/DROP TABLE/INDEX online
- Knowledge topic:
  - online DDL algorithm
Stage 4: Optimizer
- Target: implement an optimizer and be able to choose the appropriate index and Join Order
- Iconic function:
  - ability to collect statistics
  - ability to choose the appropriate index and Join Order
- Knowledge topic:
  - SQL optimization
  - statistics
  - SystemR optimizer
Stage 5: Calculation optimization
- Target: optimize the calculation framework to improve performance
- Iconic function:
  - vectorization
  - Massively Parallel Processing(MPP)
- Knowledge topic:
  - vectorization
  - MPP

Issues

feitian124 commented 3 years ago

👍

Answer 1 · 2021-08-16T01:44:37.000Z

Thanks Rebelice. The topic list is fine as next wave of Talent Plan.