Regions have non-trivial overhead
Kixiron opened this issue · 0 comments
Regions have non-trivial overhead and cost a lot more than they should. Ideally a nested region (not scope) should be near-free aside from the cost of doing logging, but using each of the three versions of this example shows dramatic performance differences between them
Version 1 (2 regions) | Version 2 (1 region) | Version 3 (No regions) | |
---|---|---|---|
30,000 iterations | 12s | 8s | 5s |
I know there's extra things going on because of the .enter()
and .leave()
calls as well as the region subgraph operators themselves, but it's still a significant difference that grows even more apparent on larger applications
use timely::dataflow::{
operators::{Enter, Exchange, Input, Inspect, Leave, Probe},
InputHandle, ProbeHandle, Scope,
};
fn main() {
timely::execute_from_args(std::env::args(), |worker| {
let index = worker.index();
let mut input = InputHandle::new();
let mut probe = ProbeHandle::new();
worker.dataflow(|scope| {
let data = scope.input_from(&mut input);
// Version 1
scope
.region(|inner| {
let data = data.enter(inner);
inner.region(|inner2| data.enter(inner2).leave()).leave()
})
.inspect(move |x| println!("worker {}:\thello {}", index, x))
.probe_with(&mut probe);
// Version 2
scope
.region(|inner| {
data.enter(inner).leave()
})
.inspect(move |x| println!("worker {}:\thello {}", index, x))
.probe_with(&mut probe);
// Version 3
data
.inspect(move |x| println!("worker {}:\thello {}", index, x))
.probe_with(&mut probe);
});
for round in 0..30000 {
if index == 0 {
input.send(round);
}
input.advance_to(round + 1);
while probe.less_than(input.time()) {
worker.step_or_park(None);
}
}
}).unwrap();
}
A potential solution I can see would be to make a truly specialized (and separate) version of Subgraph
that doesn't do any of the progress or input/output management that Subgraph
does apart from the absolute minimum to have logging stay intact