uwheel/datafusion-uwheel

Support MinMax temporal pruning

Max-Meldrum opened this issue · 1 comments

For instance, by checking with the MinMaxWheel(fare_amount) we can check quickly whether we can skip the underlying processing (e.g., Parquet):

SELECT * FROM yellow_tripdata 
WHERE tpep_dropoff_datetime >= ? AND tpep_dropoff_datetime < ? 
AND fare_amount > 1000

Initial support for this feature was added in 435ce68.

It extracts a temporal filter [start, end) and checks min/max values accordingly to possibly return an EmptyExec plan.

// helper function to check whether we can return an empty execution plan based on min/max pruning
fn maybe_min_max_exec(
    value: f64,
    op: &Operator,
    min_max_agg: MinMaxState<f64>,
    plan: &LogicalPlan,
) -> Option<Arc<dyn ExecutionPlan>> {
    let max = min_max_agg.max_value();
    let min = min_max_agg.min_value();
    if op == &Operator::Gt && max < value
        || op == &Operator::GtEq && max <= value
        || op == &Operator::Lt && min > value
        || op == &Operator::LtEq && min >= value
    {
        Some(Arc::new(EmptyExec::new(Arc::new(
            plan.schema().clone().as_arrow().clone(),
        ))))
    } else {
        None
    }
}