Allow distributed execution by allowing planner, executor to be swapped out
araddon opened this issue · 0 comments
araddon commented
Allow a distributed planner (dataux) to replace built in qlbridge execution planner (in exec
) . Currently there is no separation of planning, execution, and no strong interface to allow the distributed executor to replace the native in-process qlbridge executor. Also, no serialization interfaces/capability to allow tasks to be deterministically distributed, and no primitives (partitions, tablets) to allow work to be broken down.
Phase 1 usecase: Query Mongo with 3 nodes participating, a groub-by aggregate query select AVG(CHAR_LENGTH(CAST(
titleAS CHAR))) as title_avg from article GROUP BY category
- 1 refactor the exec job builders to support over-riding core behavior
-
plan.Visitor
with full suite of plan structs instead of usingrel
sql structs - revamp the task assembly (list of tasks,
taskparent.Add(task)
) to be interfaces not implementation - allow the job builder to have a
TaskMaker
factory for overriding built in task implementations - split the core visitor interfaces in exec into plan.Planner, exec.Walk where the output is dag of plan.Task
-
plan.PlannerDefault
and all new plan types - implement new Walk Executor in exec that translates the plans -> tasks
-
- move all of the plan, and exec Task creation into NewTaskName methods
-
- 4 Partitionalble sources: underlying sources/interfaces need to be able to support some type of partitioning so a source query can be split across nodes.
- hack in quick partition storage (
Custom
) - fix the distributed/group-by to pass underlying values (tuple?) allowing calculating finalization.
- hack in quick partition storage (
- 5 network/serialization distributed-enable
- protobuf plan.Tasks
- protobuf implementation for node
- protobuf for sql (only select ops) structs
depended on by dataux/dataux#9
moving until later:
- 6 allow visualization of dag of tasks by supporting
MarshalJson()
orExplain()
?