Implement Adaptive query execution
Dandandan opened this issue · 0 comments
Dandandan commented
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Adaptive query execution is the re-optimization of the query pipeline.
This allows for faster complex queries, as joins and other.
The pull/stage based model of Ballista allows for implementing a similar strategy as Spark.
Describe the solution you'd like
- When a stage has been finished: provide/update the statistics (row count / byte size) for the remaining stages
- Re-optimize the different stages based on (exact) stats. We can start with running only the physical optimization passes (join order, aggregate statistics, broadcast join #348, etc.) as we already converted the logical plan to the physical plan.
Describe alternatives you've considered
Additional context