goodboy/tractor

Spec out the async-ipc protocol

goodboy opened this issue · 1 comments

tractor utilizes a simple multiplexed protocol for conducting inter-process-task-communication (IPTC)?

Each per-process trio task can invoke tasks in other processes and received responses depending on the type of the remote callable. All packets are encoded as msgpack serialized dictionaries which I'll refer to as messages.

How it works

When an actor wants to invoke a remote routine it sends a cmd packet:
{'cmd': (ns, func, kwargs, uid, cid)} of type Dict[str, Tuple[str, str, Dict[str, Any], Tuple[str, str], str]]
Where:

  • ns is the remote module name
  • func is the remote function name
  • kwargs is a dict of keyword arguments to call the function with
  • uid is the unique id of the calling actor
  • cid is the unique id of the call by a specific task

The first response is a function type notifier msg:
{'functype': functype, 'cid': cid} of type Dict[str, str].
Where functype can take one of:

  • 'asyncfunc' for an asynchronous function
  • 'asyncgen for a single direction stream either implemented using an asyn generator function or a @stream decorated async func
  • 'context' for a inter actor, task linked, context. For now see #209.

Depending on the value of functype then the following message(s) are sent back to the caller:

  • 'asyncfunc':
    • a single packet with the remote routine's result {'return', result, 'cid', cid} of type Dict[str, Any]
  • 'asyncgen':
    • a stream of packets with the remote async generator's sequence of results {'yield', value, 'cid', cid} of type Dict[str, Any].
  • 'context':
    • a single 'started' message containing a first value returned from the Context.started() call in the remote task followed by a possible stream of {'yield', value, 'cid', cid} messages if a bidir stream is opened on each side. Again see #209.

A remote task which is streaming over a channel can indicate completion using a ''stop'` message:

If a remote task errors it should capture it's error output (still working on what output) and send it in a message back to its caller:

  • {'error': {tb_str': traceback.format_exc(), 'type_str': type(exc).__name__,}}

A remote actor must have a system in place to cancel tasks spawned from the caller. The system to do this should be invoke-able using the existing protocol defined above and thus no extra "cancel" message should be required (I think).

  • an example is when a local caller cancels its currently consuming stream by calling a Actor._cancel_task() routine in the remote actor. This routine should have knowledge of the rpc system and be capable of looking up the caller's task-id to conduct cancellation.
  • any remote task should should in theory be cancel-able in this way but there is not yet a "cross-actor cancel scope" system in place for generic tasks (this is maybe a todo)
    • likely we'll need our own tractor.CancelScope around calls to Portal.run() see #122

What should be done

  • spec out this protocol and define it somewhere in the code base with proper generic type definitions that can be used to type check related internal ipc apis (see @vodik's comment in #35).
  • work towards full bi-directional async generator support i.e. using .asend() also mentioned in an internal todo.
  • deeply consider how far to move forward with async generators considering the outstanding problems with garbage collection and cleanup as per discussion, Pep 533, and the recommended use of async_generator.aclosing() - maybe an alternative reactive programming approach may end up being better?

Pretty interested in using msgspec for the first draft of this.
I've made jcrist/msgspec#25 to see what we can do about getting nested msgpec.Structs which would be most ideal for defining our internal message set.

This also ties in heavily with #196 since we'll likely want to offer a system for automatically inferring schema from target remote task functions and/or allowing for an explicit declaration system of some sort.

Also of note is how we could also possibly tie this in with capnproto schemas