mirage/mirage-tcpip

`S.UDP.listen` blocks differently on `unix` vs `direct` stack

rand00 opened this issue · 5 comments

I found that when giving a callback to S.UDP.listen, the unix stack blocks on the callback before receving next UDP datagram - but the direct stack doesn't.

My usecase has been to implement TCP-like semantics on top of UDP in the conntest unikernel. As the lifetime of the callback in this case is as long as the 'UDP connection', then the callback need to be run async and potentially saved elsewhere for managing the state of the thread.

This is fine for me in this case (when I know of the semantics), but I was initially confused that the UDP callback wasn't run right away when a datagram was received - this could become a more subtle problem to find for some users, if their callback takes a bit of time but they expect it to be run instantly on each datagram. Off the top of my head this can influence:

  • performance of code using UDP
  • potential loss of datagrams if some underlying buffer gets filled before next datagram is read by client (I'm unaware of the actual underlying semantics)

Otoh. when callbacks are run concurrently, it can lead to a need for the client to synchronize between the callbacks if they mutate anything.

In any case, documentation of the semantics would be nice, and having the same behaviour across stacks.

.. and if callbacks are concurrent, it can lead to memory-leak, as packets can get received faster than callbacks finish up.

In this callback https://github.com/mirage/mirage-tcpip/blob/main/src/stack-unix/udpv4v6_socket.ml#L194 in the Unix stack is run synchronously with Lwt_cstruct.recvfrom (so there won't be multiple callbacks executed concurrently) -- while in the Mirage stack due to every mirage-net-* implementation calling "Lwt.async" for the callback of each incoming packet, the MirageOS UDP callback is asynchronous (i.e. there can be multiple callbacks concurrently).

I still don't quite understand what the desired semantics should be -- maybe the callback should not involve "io / Lwt.t" at all? Maybe it should be all sync? My intuition is that the "Lwt.async" for each incoming packet is not optimal from the performance and scheduling perspective.

But I guess that user wants to run stuff within the callback in the Lwt monad? How else would one communicate nicely with the rest of the Lwt code?

If Lwt will still be involved - I think the semantics that leads to the most correct user code should be prioritized - which is when the backend blocks on the user callbacks thread. Then the user has the possibility of saving a long-running thread elsewhere, and do manual synchronization between these concurrent callbacks, if UDP datagrams need to be handled as fast as possible.

Hi @rand00,

If Lwt will still be involved - I think the semantics that leads to the most correct user code should be prioritized - which is when the backend blocks on the user callbacks thread.

Just to understand you, you meant the semantics as observed in the Unix stack? I.e. the callback is executed completely before any new UDP packet is received and processed?

Then the user has the possibility of saving a long-running thread elsewhere,

I don't quite get what you mean.

and do manual synchronization between these concurrent callbacks,

Neither do I understand what you mean here -- I thought your argument was in favor of the "only execute one callback at a time"

Just to understand you, you meant the semantics as observed in the Unix stack? I.e. the callback is executed completely before any new UDP packet is received and processed?

Yes - if user modifies the same states with concurrent callbacks, it can lead to wrong semantics if state-changes are interlaced. But maybe this is to be expected. As long as the documentation informs the user it might be okay.

Then the user has the possibility of saving a long-running thread elsewhere,

I don't quite get what you mean.

I mean that the user can choose to either call Lwt.async themselves, or start the Lwt-thread and insert it in a datastructure to be handled later by some other manager-thread, while not return this long-running thread from the callback.

I e.g. run the longrunning callback of conntest with Lwt.async here.
.. in this case there is no need for managing cancellation, as it has a timeout registered on it which depends on receiving more packets.

and do manual synchronization between these concurrent callbacks,

Neither do I understand what you mean here -- I thought your argument was in favor of the "only execute one callback at a time"

My point was that the interface should default to blocking on the user callback - as the user still has the power to make the callbacks become concurrent - and then it's their own responsibility to implement correct concurrent semantics.