nevalang/neva

Userland Buffers Via STDlib Flows (Components)

emil14 opened this issue · 0 comments

For a long time I was thinking about buffers. There's a bufSize field in both IR and runtime program representations. Should user be able to set buffer manually? If so, what should be default value? Zero? If buffer should be calculated automatically - how to do it? Depending on static analysis? Or on some runtime information? Or both? Here's a solution I propose now, a balance between simplicity and efficiency.

Proposal

Imagine a component with this signature that will take messages in port :data and send to outport :data. If receiver is slow it will accumulate messages from the upstream until there's a number of messages equal to :size

pub flow Buffer<T>(size int, data T) (data T)

Let's say we have fast sender and slow receiver:

fast -> slow

Let's say that fast can send up to x10 faster than slow can receive. Maybe slow goes to database, calls some APIs, does some math, etc. It means that it will block "fast" after first message even though fast could produce up to 9 more.

Here's what we can do about it with our new shiny Buffer:

nodes { buf Buffer }
100 -> buf:size
fast -> buf:data -> slow

Now fast can work at full speed until there's 100 messages in a queue between buf and slow:

  • fast send first message, buf takes it and sends it to slow
  • fast sends second message, buf takes it (sending to slow don't happen, slow is not ready)
  • fast sends 9 more messages, buf takes them all
  • slow processes first message, buf sends second message and slow receives it and blocks
  • this goes on and on until at some point buf is full (it accumulated size=100 messages) and then flow is blocked (until slow is ready to receive again)

As we can see even though we blocked at some point (which is impossible to avoid without infinite memory), fast was able to send much more messages. And if somebody wants to ask "so what? we didn't processed them" I must say - there's a reason why buffers exist and the reason is that it's good not to block stuff. While fast wasn't blocking it was possible to other machinery to happen.

I feel like this kind of optimization is not possible in control-flow programming due to call-return semantics. For those who will about Go's buffered channels - this is literally dataflow programming (CSP) embeded in control-flow. BTW mix of those is a reason of some quirks, but that's a totally different story.

It's dynamic!

In Go it's impossible (at least without some obscure black magic) to dynamically change buffer size of a channel (something I'm interested in because of some runtime optimizations possibilities). As some of you maybe already figured out - it's not the case for Neva.

Our Buffer is just a Flow (component) and it's :size inport is, well, an inport. Inport can dynamically take the data. It means that instead of using constant we could use some dynamic value e.g. from a API call, from a file or whatever.

Even more than that. We could dynamically change it on the fly! Imagine writing a program where buffer sizes changes e.g. because of the load. I think that's wild.

This, however, points a question - how to react (timing) on size changes? My proposal is this - use sync.Mutex and atomically recreate channel with updated buffer from a separate goroutine each time there's a message from size port. Maybe sync.RWMutex if it make sense.

A note on automatic buffering

What we just saw is manual buffer management. This is what this issue is about. However, I cannot think about this in complete isolation from automatic buffer management because that's possible. I should probably create separate issue on that (and I will do so) but just to keep everything related in mind - this is what I suggest.

We will have some automatic formula (probably something simple, no rocket science) to figure out buffer size for each channel at compile time. For those who understands what they do and sure they need it there will be Buffer (we need to think where it should go though).