Out of Order Delivery
emil14 opened this issue · 1 comments
If all you want is to understand the issue, you can skip the Background (it's an interesting concurrency journey though!) and start from the description of the problem.
Background
It all started with #575 - simple task like "print each item of a stream and then terminate" turns out to be not that simple, sometimes program was terminating too soon.
Our first thought was that we need higher order components that can do proper iteration and terminate predictably by utilizing locks internally. Long story short - it turns out to be a rabbit hole.
We added new shiny components like For
and Map
and even starting to dream about Filter
and company. Simple examples was working fine and everything seemed great until @dorian3343 showed couple of little more complex examples where problem arrive again. Honestly at that point I was frustrated. This is what #635 was about.
After several pair programming sessions with @Catya3 we found out that problem is not about For or Map or any other component. It's deeper. It's about runtime.
Fan-In Problem
This is how #644 started. TLDR: runtime operated in terms of outgoing connections. It means that at runtime connection was sender -> receiver[]
. In other words, network was a fan_out map. And for each such connection we have a goroutine.
In Neva network graph is a many-to-many relation. Not just one sender can have many receivers but also one receiver can have many senders. It means that not just fan-out (split) but also fan-in (join) is possible. And that's a big problem was this design. It means we can have 2 concurrent goroutines sharing the same receiver.
The simplest case is 2 one-to-one connections like this:
s1 -> r1
s2 -> r1
s1
might send to r1
before s2
but (if sending happens fast enough) there might be race (because goroutines are concurrent). Go scheduler might activate s2->r1
goroutine before s1->r1
and r1
will receive messages out of order. By "out of order" I mean order that is different that order of sending.
First Attempt To Solve (Single Queue)
Goroutines was source of race condition. We need to get rid of them and that will fix out of order right? Well yeah but...
After short thinking decision was made - let all outports send to single queue. If s1 sends to queue it blocks s2, s2 awaits for s1 to send. Then there's a one single goroutine that reads messages that from queue and delivers them to corresponding receivers. This is a relatively simple design but it has 2 problems:
Problem 1. Intermediate Connections
Neva program usually include not just "atomic" components (runtime functions) but also "complementary" ones - components implemented in Nevalang itself. They are only ports and connections at runtime. No runtime function is spawned to represent them. They "do nothing" except message passing.
Let's say we have this program:
flow Main(start) (stop) {
nodes {Printer}
:start -> printer -> :stop
}
flow Printer(data) (sig) {
nodes {Println}
:data -> println -> :sig
}
In this example Printer
is complementary component and Println
is atomic component. Question is - if we don't have goroutine per connection then who will receive from printer:data
? Let's say runtime itself writes message to queue, then we do lookup and see that printer:data
is receiver for :start
and write message there. And... nothing happens. This is "intermediate connection" - connection that does not lead to runtime function receiver. And this design can't handle this.
Problem 2. Deadlock
Let's pretend there's no buffer for simplicity. Simplest case s1 -> r1 -> :stop
s1
sends to queue, queue receives and tries to send tor1
r1
receives and starts its works1
writes next message to the queue and blocks- queue tries to write next message to
r1
and blocks untilr1
is ready r1
finishes its job and tries to send to queue and blocks because queue tries to send tor1
- deadlock
Then we thought maybe we can avoid having single queue for all outports and only enforce for senders involved into fan-in pattern? Nope, that won't work. Turns out we can still have deadlock if loop is involved. (I won't dig into the details here, I believe you can find them somewhere in the issues).
Long story short - every outport must have it's own queue (channel). This is the only way (I see) to not have deadlocks.
Second Attempt To Solve (Serializable Messages in Fan-In-Centric Algorithm)
First problem was solved by implementing graph reduction algorithm - by removing all intermediate connections from a program we could get "reduced" version (smaller size but functionally equal) where runtime functions do message passing without intermediate hops. This was idea for performance optimisation for a long time. Who could think we would need this just for correctness? And that worked! Of course it didn't solve problem with deadlock though.
For second problem no other solution could exist but rewrite runtime again... This time with these 2 requirements:
- no shared queues - each outport is a separate go-channel (just like before first rewrite)
- fan-in centric algorithm - use outgoing connections map instead of incoming one and somehow serialize messages before sending to receiver
The idea was this - each time when we send a message, we increment atomic counter and assign it's current value a sent message. This way each message can have its own unique and chronologically accurate comparable identifier. Then, each time we have more than one message available for delivery we sort them by this ID and deliver to receiver in an order that is the same as order of sending.
Before atomic counter there was an idea to use UNIX timestamp (nanoseconds). Turns out computer can work so fast that several messages could be sent in single nanosecond. It means we can't tell which one was sent first (even though there was clear order).
Design that meets these requirements was implemented and shipped in #670
The Issue Itself (Out of Order Delivery)
After merge of #670 fan-in issue is finally fixed. However, there are still issue. The new issue ironically caused by increased performance. Issue we're talking about can only occur when things happen fast enough and old version of runtime was slow enough so we couldn't see it.
To understand the issue let's look at this little program. This slightly desugared version of e2e test "for with range and if". I replaced deferred connections with explicit locks, it just easier to understand what is actually is happening:
import { lists }
const lst list<bool> = [true, false]
flow Main(start) (stop) {
nodes { Iter, For{Printer} }
:start -> ($lst -> iter -> for -> :stop)
}
flow Printer(data bool) (sig any) {
nodes { If, Println, l1 Lock, l2 Lock }
:data -> if
1 -> l1:data
0 -> l2:data
if:then -> l1:sig
if:else -> l2:sig
l1 -> println
l2 -> println
println -> :sig
}
It's clear that we send true
and then false
. Does it mean that we must see 1
and then 0
? Turns out it doesn't!
Note that we have fan-in here:
l1 -> println
l2 -> println
But it's not the problem. With new implementation we now for sure that if l1
sends first then println
receives first from l1
. Problem is - we not sure l1
sends first!
Let's look at upstream connections
if:then -> l1:sig
if:else -> l2:sig
We know for sure that if:then
sends first and if:else
sends second. Does it mean that l1
receives before l2
? No! These are parallel concurrent connections! l1
might be "busy" even though sending from if:then
happened before. There could be different reasons for that e.g. Go scheduler suspend runtime function that reads from l1
and for some reason decided to activate l2
before l1
. It's better not to think about that. That's just fundamental indeterminism.
It's honestly not clear what to do about that. Situation with fan-in was clearly a bug but this... I have a feeling that the program itself is incorrect. Of course we could give up on ordering but that would mean Neva isn't really general purpose. I think we can do better than that.
Possible Solutions
- Another modification of a runtime - for this we need an idea first and for now I don't have one. Even more than that - it looks like language is ok, the program is incorrect. But how to program this is, again, unclear
- Modification of standard library - maybe we need more/different tools to be able to enforce ordering? Maybe some variations of
Lock
?