Userland Buffers Via STDlib Flows (Components)
emil14 opened this issue · 0 comments
For a long time I was thinking about buffers. There's a bufSize
field in both IR and runtime program representations. Should user be able to set buffer manually? If so, what should be default value? Zero? If buffer should be calculated automatically - how to do it? Depending on static analysis? Or on some runtime information? Or both? Here's a solution I propose now, a balance between simplicity and efficiency.
Proposal
Imagine a component with this signature that will take messages in port :data
and send to outport :data
. If receiver is slow it will accumulate messages from the upstream until there's a number of messages equal to :size
pub flow Buffer<T>(size int, data T) (data T)
Let's say we have fast sender and slow receiver:
fast -> slow
Let's say that fast
can send up to x10 faster than slow
can receive. Maybe slow goes to database, calls some APIs, does some math, etc. It means that it will block "fast" after first message even though fast could produce up to 9 more.
Here's what we can do about it with our new shiny Buffer
:
nodes { buf Buffer }
100 -> buf:size
fast -> buf:data -> slow
Now fast
can work at full speed until there's 100 messages in a queue between buf
and slow
:
fast
send first message,buf
takes it and sends it toslow
fast
sends second message,buf
takes it (sending toslow
don't happen,slow
is not ready)fast
sends 9 more messages,buf
takes them allslow
processes first message,buf
sends second message andslow
receives it and blocks- this goes on and on until at some point
buf
is full (it accumulatedsize=100
messages) and then flow is blocked (untilslow
is ready to receive again)
As we can see even though we blocked at some point (which is impossible to avoid without infinite memory), fast
was able to send much more messages. And if somebody wants to ask "so what? we didn't processed them" I must say - there's a reason why buffers exist and the reason is that it's good not to block stuff. While fast
wasn't blocking it was possible to other machinery to happen.
I feel like this kind of optimization is not possible in control-flow programming due to call-return semantics. For those who will about Go's buffered channels - this is literally dataflow programming (CSP) embeded in control-flow. BTW mix of those is a reason of some quirks, but that's a totally different story.
It's dynamic!
In Go it's impossible (at least without some obscure black magic) to dynamically change buffer size of a channel (something I'm interested in because of some runtime optimizations possibilities). As some of you maybe already figured out - it's not the case for Neva.
Our Buffer
is just a Flow (component) and it's :size
inport is, well, an inport. Inport can dynamically take the data. It means that instead of using constant we could use some dynamic value e.g. from a API call, from a file or whatever.
Even more than that. We could dynamically change it on the fly! Imagine writing a program where buffer sizes changes e.g. because of the load. I think that's wild.
This, however, points a question - how to react (timing) on size changes? My proposal is this - use sync.Mutex
and atomically recreate channel with updated buffer from a separate goroutine each time there's a message from size
port. Maybe sync.RWMutex
if it make sense.
A note on automatic buffering
What we just saw is manual buffer management. This is what this issue is about. However, I cannot think about this in complete isolation from automatic buffer management because that's possible. I should probably create separate issue on that (and I will do so) but just to keep everything related in mind - this is what I suggest.
We will have some automatic formula (probably something simple, no rocket science) to figure out buffer size for each channel at compile time. For those who understands what they do and sure they need it there will be Buffer
(we need to think where it should go though).