Higher level (`struct` based) process abstraction

Question

Higher level (`struct` based) process abstraction

bkolobara opened this issue 3 years ago · 12 comments

Defining and spawning processes are fundamental tasks you do when working with lunatic. Naturally, we want to make the developer experience around them as pleasant as possible. I would like to introduce a new higher-level Rust API for defining processes that is easier to use, provides powerful abstractions (like hot-reloading) and works nicely with Rust's type system. Before going into details of this proposal I would like to take a step back just to explain how the current system works and how we got there.

Lunatic allows you to point to a function in your Rust application and spawn a process from it. This is how Erlang/Elixir works too; and it's a really simple yet powerful tool.

However, lunatic is at the same time a "system" to run any WebAssembly module as a process. This means that it can dynamically load .wasm files written in different languages, spawn processes and send messages to them. Contrary to spawning a process from the currently running module, we can't have any type system guarantees about what messages the spawned process can receive. We may not even have access to the source code of the loaded WebAssembly module.

In the rest of this text I'm solely going to focus on the act of spawning processes from the currently running module where we actually can utilise Rust's type system.

History

From the first days of using Rust with lunatic, I have envisioned that you can just simply spawn processes from functions, the same way you can do it in Erlang/Elixir. A big obstacle here is the fact that Rust is a strongly typed language and Erlang/Elixir are dynamic. How do you fit a concept of processes and messages into the type system? How could you send different types of messages to a process and have the type system catch errors during compilation?

My first approach just mimicked the channels approach that is used in other Rust libraries and in Go. This means that the message type was bound to the channel and a process could capture many channels on startup. This is not ideal, as it moves away from the single mailbox principle and suddenly you have as many "mailboxes" as you have channels. Also, you can't simultaneously wait on multiple channels, as their return values may be of different type. You would always end up capturing one channel and wrapping all the different message types in a super-type enum. This resulted in me getting rid of the channels approach and just having a single mailbox that is received as an argument of the process entry function. This is basically what we have today.

I'm generally satisfied with the current approach and believe that it gives you a simple way to spawn one-off processes and compile time errors if you try to send messages of wrong types to a process (that can't handle them).

Proposal

This "function as a process" approach starts to fall apart once you have more complex processes with complicated behaviours. For example, you can't force a process to return you a message. There is no type level support with the current system to enforce that if a process receives one type of message it should respond with a specific type. Ideally we would like to be able to express this kinds of contracts with the Rust type system.

Erlang/Elixir has a higher level abstraction, the GenServer (generic server). You can implement two kinds of behaviours:

handle_call - You received a request and are supposed to respond with a reply.
handle_cast - Another process sent you a message, but is not waiting for a response.

I would like to bring the same concept to Rust's lunatic library.

Example

Another library in the Rust ecosystem already figured out a good approach on modelling such behaviours inside Rust's type system, Actix. If we represent a process state as a struct we can define different message handlers on it. The new API would look something like this:

// A message
#[derive(Serialize, Deserialize)]
struct Sum(usize, usize);

// The process state
struct Calculator;

impl lunatic::Process for Calculator {
    type Context = Context<Self>;
}

// A handler for `Sum` messages
impl HandleCall<Sum> for Calculator {
    type Result = usize; // <- response type

    fn handle(&mut self, msg: Sum, ctx: &mut Context<Self>) -> Self::Result {
        msg.0 + msg.1
    }
}

fn main() {
    let addr = Calculator.start();
    let res = addr.send(Sum(10, 5));

    match res {
        Ok(result) => println!("SUM: {}", result),
        _ => println!("Communication to the actor has failed"),
    }
}

In this example we get compile time guarantees that all the messages sent & received are of the correct type. We also can force the process to respond with an appropriate type (Self::Result) when a message of a specific type (Sum) is sent. If we left the response out, the code would not even compile.

Hot reloading

The same way Erlang's GenServer makes it easier to do hot-reloading of Erlang processes, once we have more structure (pun intended) around the processes we can also introduce process lifecycles that make it possible to accomplish hot reloading. We can enforce that the process state implements Serialize + Deserialize and on code changes we just serialize the process state and deserialize it as part of the new implementation.

The Process trait could provide default implementations if the state structure stayed the same, but should also allow developers to define "state transitions" to new versions:

enum Action {
    Reset,    // The process is re-spawned with a new state.
    HotUpdate // Hot-reload the process and try to reuse the previous state. 
}

impl lunatic::Process for State {
    type Context = Context<Self>;

    fn update_behaviour() -> Action { Action::HotUpdate }
    fn update(old_state: Data) -> Result<Self, UpdateError> {
        // old_state contains a serialized version of the previous `Self`.
        // The implementation of this method needs to deserialize it and create a new
        // version of `Self`.
    }
}

This would also make it possible to move processes between machines. If the state can be serialized it can be moved to another node and a process could be bootstrapped there from it.

All of these features don't require any changes to the lunatic runtime and can be completely implemented as a library on the currently existing primitives. I think that this is an important characteristic of lunatic, we can keep the underlaying runtime lean, simple and performant, but build really powerful abstractions on top of it.

Summary

I believe that by adding this kind of API we can lean much harder on Rust's type system to enforce correctness. At the same time it nicely mimics Erlang's proven approach.

This is not meant to replace the function based API, it's more of an augmentation. It represents a philosophy on how to structure your state, requests, responses and handle lifecycle (code updated). The function based API still gives you full flexibility to programatically handle messages and is really convenient when creating processes from small closures (timeouts, etc.).

Answer 1 · 2021-10-07T19:00:57.000Z

Overall I feel OK with the API approach proposed here. But one thing bothers me during Process start you directly call .start in the process and I think that's bad.

When looking at other implementations of actors we never create a process directly (with the exception of using spawn) but when dissolving some kind of dissonant runtime context (ActorSystem in Akka for example), when getting the process from the context we have the concept of reference when justifying the value, that is, we get a reference to the Actor / Process when avoiding it and I believe that this is a powerful abstraction for the implementation of a supervision tree because in this case the supervisor process is the one who manages the life cycle of the processes children, in other words, it should call start / stop of the child processes, in this way all processes that do not declare a supervisor are part of the root supervisor which in turn calls the start of all its children, in this sense the supervisor itself created in the User API is just a special type of Actor/Process.

This also refers to Erlang / Elixir where we have the concept of application that I think would also be beneficial to have in our API, ie, when we start manually, we declare them all as a list of processes and start in the root supervisor, in Elixir it would look like this :

...
children =
      [
        ProcessA,
        ProcessB
      ]

    opts = [strategy: :one_for_one, name: Application.RootSupervisor]
    Supervisor.start_link(children, opts)

I'm not saying that we should handle the supervisory tree in this API, but having the Context/Runtime type and treating Processes only as references and delegating their start to the Context/Runtime would make it easier to create a Supervisory API in the future

Answer 2 · 2021-10-07T19:10:47.000Z

In Actix this is done through System as:

fn main() {
     let mut system = System::new();

     let reference = system.block_on(async { Calculator.start() });

     system.run();
}

In this case it also uses a start function, but this is passed to System, I don't like this Actix approach but as it is done through System I consider it ok.
This is also good for location transparency purposes as you only get a reference to the Process so it doesn't matter if the Process is local or remote and this is made explicit in the API

Answer 3 · 2021-10-08T07:48:16.000Z

Yes, I would agree with your comments. Right now the API looks as following (Naming is temporary):

fn main() {

    let my_actor = MyActor { count: 0 };
    let gen_server = GenServer::new(my_actor);
    let process = gen_server.start();

    process.cast("hi".to_owned());
    process.cast(32);

    let _reply = process.call(32).unwrap();
}

#[derive(Serialize, Deserialize)]
struct MyActor {
    count: u32,
}

impl Handle<u32> for MyActor {
    type Reply = u32;

    fn handle_cast(&mut self, msg: u32) {
        self.count += msg;
    }

    fn handle_call(&mut self, msg: u32) -> Self::Reply {
        self.count += msg;
        self.count
    }
}

impl Handle<String> for MyActor {
    type Reply = String;

    fn handle_cast(&mut self, _msg: String) {}

    fn handle_call(&mut self, msg: String) -> Self::Reply {
        msg
    }
}

I think this would work a lot better, and in the future, other GenServer implementations can be created easily.

For process based the api might look something like this. This is mostly the part I'm trying to figure out now, how to call the proper Handle function on the receiving side.

    let my_actor = MyActor { count: 0 };
    my_actor.spawn(|my_actor: MyActor, mailbox: Mailbox| {
        
    })

Answer 4 · 2021-10-08T07:53:21.000Z

In this case it also uses a start function, but this is passed to System, I don't like this Actix approach but as it is done through System I consider it ok.

In this example actix::System is just a Rust async executor. It is required because Actix can only run inside of an async context and I don't think it's actually related to the actor system at all. The same code could have been written:

#[actix::main]
fn main() { Calculator.start(); }

In this proposal I mostly focus on defining processes and their interactions with incoming messages, trying to elegantly fit it into Rust's type system.

I believe that you are talking about a bigger abstraction, on how to define systems/applications/supervision trees. I agree that they are important and we already need to start thinking on how to fit process definitions into them.

I also think they fit well into the proposed design. On a lower level you will always need the start/spawn method, because someone needs to spawn the process. Even if you leave it to the supervisor/system, they will be calling the method for you in the background. A supervisor implementation could have an API like:

fn main() {
     let children = [ProcessA::default, ProcessB::default];
     Supervisor::start_link("SomeName", Strategy::OneForOne, children);
}

In this case the supervisor will spawn processes for you, using the default values for the initial state.

Answer 5 · 2021-10-08T08:08:57.000Z

@jvdwrf This looks already much better. I would just propose a few changes:

Turn GenServer into a trait, then implement it for MyActor. It can also be automatically implemented for any type that implements Serialize + Deserialize:
```
trait GenServer {
    // Default implementation
    fn spawn(&self) -> Process { ... }
}

impl GenServer for T where T: Serialize + Deserialize { ... }
```
Then you can just write my_actor.spawn(), instead of needing to wrap it into GenServer.
Have two separate traits for HandleCast<T> (only requiring the handle_cast implementation) and HandleCall<T> (only requiring the handle_call implementation). I assume that in the majority of cases you just want one and that the other one is going to be empty.

Answer 6 · 2021-10-08T08:24:38.000Z

I have thought about this, but am not sure whether this would make the api any better, right now I want to have the GenServer as a struct, but implement a Spawn trait for any actors. Then they get a context where handle functions can be called, this is what internally the GenServer will use. (So anything that implements Handle will have a spawn() function, and GenServers, Supervisors etc will have a start() and start_link() function)
I think I will try my approach first, changing the api afterward is not so much work I think.
Yes I will do this, but for right now this is simpler for testing some stuff out

Answer 7 · 2021-10-08T11:30:26.000Z

In this case it also uses a start function, but this is passed to System, I don't like this Actix approach but as it is done through System I consider it ok.

In this example actix::System is just a Rust async executor. It is required because Actix can only run inside of an async context and I don't think it's actually related to the actor system at all. The same code could have been written:
#[actix::main]
fn main() { Calculator.start(); }
In this proposal I mostly focus on defining processes and their interactions with incoming messages, trying to elegantly fit it into Rust's type system.

I believe that you are talking about a bigger abstraction, on how to define systems/applications/supervision trees. I agree that they are important and we already need to start thinking on how to fit process definitions into them.

I also think they fit well into the proposed design. On a lower level you will always need the start/spawn method, because someone needs to spawn the process. Even if you leave it to the supervisor/system, they will be calling the method for you in the background. A supervisor implementation could have an API like:
fn main() {
     let children = [ProcessA::default, ProcessB::default];
     Supervisor::start_link("SomeName", Strategy::OneForOne, children);
}
In this case the supervisor will spawn processes for you, using the default values for the initial state.

Yes, that's what I tried to express

Answer 8 · 2021-10-08T11:47:02.000Z

@jvdwrf This looks already much better. I would just propose a few changes:

1. Turn `GenServer` into a `trait`, then implement it for `MyActor`. It can also be automatically implemented for any type that implements `Serialize + Deserialize`:
   ```rust
   trait GenServer {
       // Default implementation
       fn spawn(&self) -> Process { ... }
   }
   
   impl GenServer for T where T: Serialize + Deserialize { ... }
   ```
   
   
       
         
       
   
         
       
   
       
     
   Then you can just write `my_actor.spawn()`, instead of needing to wrap it into `GenServer`.

2. Have two separate traits for `HandleCast<T>` (only requiring the `handle_cast` implementation) and `HandleCall<T>` (only requiring the `handle_call` implementation). I assume that in the majority of cases you just want one and that the other one is going to be empty.

One problem I see in your approach to GenServers is that everything about GenServer is related to state management within a message receiving loop, which is why Erlang/Elixir handler signatures always return a tuple containing among other things the current state of the GenServer, this is important so that the loop always knows what the current state is and always sends the current state in handler calls.
I don't think we can call GenServers something that doesn't follow this pattern about state.

Elixir GenServer example:

def MyGenserver do
  use GenServer

  def start_link(state) do
    GenServer.start_link(__MODULE__, state)
  end
  
  @impl true
  def init(state) do
    {:ok, state}
  end

  @impl true
  def handle_call(:some_message_type, from, state) do
    ...
    response = ¨Hello¨
    {:reply, response, state}
  end

  @impl true
  def handle_cast({:some_message_type, payload}, state) do
    {:noreply, [payload | state]}
  end
end

Note in the example that everything is done around the management of the state. An initial state is passed during startup, then that state is manipulated and again it is returned to the genserver loop. It's not just about handling events coming from the Mailbox but wrapping those events with some state.

Another very interesting project that is not based on Erlang but implements distributed Erlang and implements all these abstractions we are discussing is Ergo, and because it is not based on Erlang/Elixir it can be an interesting source of information in developing our APIs , I strongly recommend taking a look

Reference:
https://hexdocs.pm/elixir/1.12/GenServer.html

Answer 9 · 2021-10-08T14:14:14.000Z

I don't think we can call GenServers something that doesn't follow this pattern about state.

I'm not a big fan of the name GenServer, it's a bit cryptic and I didn't realise it meant generic server when I used it first time. I think there is some room to improve some of the names and am open to suggestions.

However, I do think this is exactly the pattern of a GenServer. Each of the handle_call/cast methods gets a mutable reference to its state (&mut self) as first argument. This allows you to mutate it during the call. Elixir only has immutable data types and returning the new state as a return value is the only way to change it, but all Rust developers are familiar with this approach of passing in a mutable self as first argument. It's just more idiomatic to the language, but accomplishes the same thing.

Answer 10 · 2021-10-11T14:58:11.000Z

However, I do think this is exactly the pattern of a GenServer. Each of the handle_call/cast methods gets a mutable reference to its state (&mut self) as first argument. This allows you to mutate it during the call. Elixir only has immutable data types and returning the new state as a return value is the only way to change it, but all Rust developers are familiar with this approach of passing in a mutable self as first argument. It's just more idiomatic to the language, but accomplishes the same thing.

I think the most important thing is to maintain the concept that all functions receive the initial state and at each interaction in the loop this state is passed to the functions, which in turn update the state passed to them and return this state to the loop of the GenServer so that it can again pass the state forward in all interactions, I agree with the language issue, but the passing of the state to the loop that handles the GenServer must be explicit, this both for elixir and for Rust or for any language

Answer 11 · 2021-10-11T15:12:48.000Z

I don't think we can call GenServers something that doesn't follow this pattern about state.

I'm not a big fan of the name GenServer, it's a bit cryptic and I didn't realise it meant generic server when I used it first time. I think there is some room to improve some of the names and am open to suggestions.

About the name being cryptographic, I think within the Erlang concept and how OTP and for its time the name was appropriate, it was consolidated in the community and it was passed on as a standard, I don't think there is a name that defines this standard in Actor model as this was not defined in the actor model, but most people who have studied Actor model have probably studied some Erlang and therefore should know exactly what a GenServer is. On the other hand, people who are having contact for the first time should receive an accurate explanation of what a GenServer is.

The Elixir description for a gen_server is short but accurate:

“A behaviour module for implementing the server of a client-server relation.

A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on....”

https://hexdocs.pm/elixir/1.12/GenServer.html

In other words, if the user needs client and server semantics and the server side needs to maintain state then GenServer is an excellent abstraction for this purpose.

Answer 12 · 2022-04-25T14:29:26.000Z

Closed by #15!