TimelyDataflow/timely-dataflow

Regions have non-trivial overhead

Kixiron opened this issue · 0 comments

Regions have non-trivial overhead and cost a lot more than they should. Ideally a nested region (not scope) should be near-free aside from the cost of doing logging, but using each of the three versions of this example shows dramatic performance differences between them

Version 1 (2 regions) Version 2 (1 region) Version 3 (No regions)
30,000 iterations 12s 8s 5s

I know there's extra things going on because of the .enter() and .leave() calls as well as the region subgraph operators themselves, but it's still a significant difference that grows even more apparent on larger applications

use timely::dataflow::{
    operators::{Enter, Exchange, Input, Inspect, Leave, Probe},
    InputHandle, ProbeHandle, Scope,
};

fn main() {
    timely::execute_from_args(std::env::args(), |worker| {
        let index = worker.index();
        let mut input = InputHandle::new();
        let mut probe = ProbeHandle::new();

        worker.dataflow(|scope| {
            let data = scope.input_from(&mut input);

            // Version 1
            scope
                .region(|inner| {
                    let data = data.enter(inner);
                    inner.region(|inner2| data.enter(inner2).leave()).leave()
                })
                .inspect(move |x| println!("worker {}:\thello {}", index, x))
                .probe_with(&mut probe);

            // Version 2
            scope
                .region(|inner| {
                    data.enter(inner).leave()
                })
                .inspect(move |x| println!("worker {}:\thello {}", index, x))
                .probe_with(&mut probe);

            // Version 3
            data
                .inspect(move |x| println!("worker {}:\thello {}", index, x))
                .probe_with(&mut probe);
        });

        for round in 0..30000 {
            if index == 0 {
                input.send(round);
            }

            input.advance_to(round + 1);
            while probe.less_than(input.time()) {
                worker.step_or_park(None);
            }
        }
    }).unwrap();
}

A potential solution I can see would be to make a truly specialized (and separate) version of Subgraph that doesn't do any of the progress or input/output management that Subgraph does apart from the absolute minimum to have logging stay intact