madelson/MedallionShell

Consider optimizing read buffering with a "keeping up" model

Opened this issue · 0 comments

Today, we have a dedicated task buffering stdout/err content regardless of whether that content is immediately being piped elsewhere.

We could optimize this by instead having a model where we allow for "read-through" behavior that allows a voracious consumer to read directly from the source if the buffer is empty, thereby avoiding the 2 extra copies (source->buffer, buffer->consumer).

A challenge here is that an aggressive buffering loop means we won't ever hit the read-through scenario since we'll essentially always have an active read on source. To address this, we could implement a backoff approach where the buffering loop can detect a hungry reader by the fact that the buffer is empty (after the first read of course). In that case, the buffer loop can increase (from an initial value of 0) a delay between reads. We can also increase the backoff if the buffer loop declines to initiate a source read due to an active direct read. Correspondingly we can decrease the backoff whenever we initiate a buffered read or go to write to the buffer and find that it is not empty (in the latter case we might want to drop backoff to 0). Even keeping these delays in a short range (e.g. 1-100ms) we should be able to allow a hungry reader to do mostly if not entirely direct reads.

As part of this, we can tighten the requirements for TestPipeline