araddon/qlbridge

Allow distributed execution by allowing planner, executor to be swapped out

araddon opened this issue · 0 comments

Allow a distributed planner (dataux) to replace built in qlbridge execution planner (in exec ) . Currently there is no separation of planning, execution, and no strong interface to allow the distributed executor to replace the native in-process qlbridge executor. Also, no serialization interfaces/capability to allow tasks to be deterministically distributed, and no primitives (partitions, tablets) to allow work to be broken down.

Phase 1 usecase: Query Mongo with 3 nodes participating, a groub-by aggregate query select AVG(CHAR_LENGTH(CAST(titleAS CHAR))) as title_avg from article GROUP BY category

  • 1 refactor the exec job builders to support over-riding core behavior
    • plan.Visitor with full suite of plan structs instead of using rel sql structs
    • revamp the task assembly (list of tasks, taskparent.Add(task)) to be interfaces not implementation
    • allow the job builder to have a TaskMaker factory for overriding built in task implementations
    • split the core visitor interfaces in exec into plan.Planner, exec.Walk where the output is dag of plan.Task
      • plan.PlannerDefault and all new plan types
      • implement new Walk Executor in exec that translates the plans -> tasks
    • move all of the plan, and exec Task creation into NewTaskName methods
  • 4 Partitionalble sources: underlying sources/interfaces need to be able to support some type of partitioning so a source query can be split across nodes.
    • hack in quick partition storage (Custom)
    • fix the distributed/group-by to pass underlying values (tuple?) allowing calculating finalization.
  • 5 network/serialization distributed-enable
    • protobuf plan.Tasks
    • protobuf implementation for node
    • protobuf for sql (only select ops) structs

depended on by dataux/dataux#9

moving until later:

  • 6 allow visualization of dag of tasks by supporting MarshalJson() or Explain()?