TimelyDataflow/timely-dataflow

Communication design: MergeQueues

utaal opened this issue · 2 comments

utaal commented

@frankmcsherry what's the reasoning behind having the MergeQueue merge contiguous Bytes? Where do you expect performance benefits from this? /cc @LorenzoMartini @antiguru

I think the intent was that the merge wants to happen somewhere, so that the receiver is able to memcpy large blocks of bytes, and that any work that can happen on worker threads rather than communication threads should happen there. That second point is negotiable if it is making the design more complicated.

There are some related "concerns" like only the writer gets a chance to notice that things are contiguous as it writes them, so if we want to avoid these buffers getting 1M elements long, it would need to be the one to merge them, and that seems appealing (though no evidence that not doing so actively harms performance).

If the question is: "why not let the receiver do the merges?" it's a fair question and I guess it comes down to whether the potentially spiky work pushed to the comm thread is worse than the exotic concurrency primitive needed.

utaal commented

Got it. Initial anecdotal evidence seems to suggest that the concurrency primitive may need to be more exotic than it currently is - if we stay with the current design. The Arc<Mutex< seems pretty expensive (latency-wise), but we'll have more precise numbers soon. @LorenzoMartini is testing https://github.com/utaal/timely-dataflow/tree/contention, which replaces the MergeQueue with an spsc-based channel that only considers merges before elements are pushed into the queue.