apache/datafusion-ballista

Implement Adaptive query execution

Dandandan opened this issue · 0 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Adaptive query execution is the re-optimization of the query pipeline.
This allows for faster complex queries, as joins and other.

The pull/stage based model of Ballista allows for implementing a similar strategy as Spark.

Describe the solution you'd like

  • When a stage has been finished: provide/update the statistics (row count / byte size) for the remaining stages
  • Re-optimize the different stages based on (exact) stats. We can start with running only the physical optimization passes (join order, aggregate statistics, broadcast join #348, etc.) as we already converted the logical plan to the physical plan.

Describe alternatives you've considered

Additional context