droyo/styx

Recursive mount-like behaviour for distributed 9p servers

Closed this issue · 13 comments

dmorn commented

Background

I'm writing a piece of software that given a task description spawns a server running that task somewhere (usually in a containerised environment, we need to scale up and down easily and that makes it easier to handle).

I'm modelling the thing like this: each worker that is executing the task remotely is called a "process", which is a 9p server exposing 4 files: ctl, retv, status (which might become log), err. This way the process can just be mounted and inspected.
There is a server that is providing this service, I call it "flexi". It is a 9p server too: it exposes a ctl file where clients can write task descriptions to (in my case, json encoded payloads).

The idea is flexi reads the task descriptor, starts the remote process and returns a number to the client, describing the folder that is containing the processes mounted fs (pretty much how you take care of TCP connections under plan9).

This fs is an example of a flexi fs with a running process:

flexi
├── 0
│   ├── ctl
│   ├── err
│   ├── retv
│   └── status
└── ctl

Main

I looks like I cannot recursively mount 9p servers using the osxfuse driver. I'm thinking about making flexi forward 9p messages to the process's server when necessary, in a man-in-the-middle proxy style to work around the problem. What do you think @droyo? Do you have suggestions/alternative approaches? Thank you for the library btw 😊👍

Just a thought from the peanut gallery: it somewhat breaks the abstraction, but could you mount the other 9p servers of running jobs in a different directory? Maybe pass flexi some other directory on the machine when it starts up, or have flexi make a temp directory and inform that user of the directory, and have it so that flexi mounts new jobs in that directory and both flexi or yourself can look at/control the clients from that folder. It loses the intuitive 'nesting' you'd get from presenting the remote folders in flexi's FS and proxying the communication, but may be simpler. Otherwise, the proxying seems a reasonable approach, at least to me. In fact, it might even be a nice thing to make a library.

dmorn commented

This is what I tested as first approach indeed! I'm a little bit concerned about leaking umounts, but I do want to keep that as an open option. I'm more into the second one, let's see if we can make a library out of it! 😊 I'm not sure where the proxying should take place though. Would it be a sane approach to use this "proxy" on a per request basis? Maybe it would be nice to hijack the request to allow the proxy to just relay messages in a decode-encode (and vice-versa) fashion (trimming and restoring paths)!

droyo commented

I think it will be tricky to do with this package being the way it is today. I would use the styxproto package directly, since all of the "help" the styx package tries to do keeping track of fids and sessions could just get in the way.

You'll need to intercept all Tattach and Tauth messages and respond to them directly. Whether or not you want a 1:1 relationship between user -> proxy sessions and proxy -> backend sessions is up to you.

You have to intercept all Twalk messages and strip a prefix from them, and you'll need to intercept all Rwalk messages from the backend and prepend a prefix to them. Since Twalk is the only way to create new Fids, you can also intercept the Twalks to populate a mapping from Fid -> backend session. You'll need to intercept Tclunk and Tremove requests to remove items from the map. You can then create an interface to intercept anything with a Fid and redirect it:

type Fcall interface {
    Fid() uint32
}
func relay(d *styxproto.Decoder) {
    for d.Next() {
        switch m := d.Msg().(type) {
        case styxproto.Twalk:
            // store newfid in a pending map, commit after you
            // see an Rwalk response from the backend
        case Fcall:
            fid := m.Fid()
            if w, ok := fidToEncoder[fid]; ok {
                styxproto.Write(w, m)
            }
        }
    }
}

Another tricky part is that you'll need to keep track of the request tags so that you can route Tflush commands to the appropriate server and so you can match Rwalk and Rerror messages to the appropriate Twalk to know if the new fid is valid.

Outside of those corner cases you should be able to act as a dumb pipe, callingstyxproto.Decoder.Next on the client-facing connection, then styxproto.Write on the appropriate backend connection to relay messages, and doing the reverse to relay responses.

It sounds difficult enough and generic enough that it might make a good addition to the library.

dmorn commented

Thank you @droyo and @marzhall for the guidance. In the next few days I'll try to make an implementation within the flexi project that we can easily extract and integrate in the library, if we want to. I'll keep you up to date!

okvik commented

I would steer away from layer-breaking proxying unless or until you find that it absolutely is needed for performance, or something — which I doubt you will.

An alternative is to let flexi interface with worker file servers exactly as a regular client would, that is, mount them in its namespace and map the requests it gets from clients to matching file operations on worker trees, then translating (copying) the results to replies to clients, which won't notice better.

Apart from mapping the walks and handling directory reads to export the tree as you want it this is trivial to implement and it'll simply continue working without any change if you happen to change the worker file API—flexi doesn't even need to know anything about it.

Examples of this approach are Plan 9 exportfs(4) and—shamelessly—unionfs(4).

dmorn commented

Hi @okvik, thank you for joining the conversation. I do agree that this issue should be addresses by mounting the remote fs within flexi's fs. mount under plan9 is actually converting local 9p messages into RPC if I'm not wrong, so that would the proxy we're talking about. Being practical though, we cannot recursively 9 mount under macos using the osxfuse driver as far as I know! So, how do we want to proceed?

okvik commented
dmorn commented

I think this solution is somehow similar to what @marzhall was suggesting with this comment right? There is just one thing I don't like about this approach: we mount a separate workers directory outside flexi's own tree. This is where "recursive mount" comes from: I was wondering wether we could do as you suggest, but instead of mounting the remote processes inside a separate directory (being workers in this case) mounting straight inside flexi's mountpoint. Do you see my point?

dmorn commented

I mean, under plan9 even the fs root point (/) is obtained from a 9p server (right?): that would mean when I mount something in my namespace, I'm recursively mounting a 9p namespace inside another one. I wanted to make flexi follow this behaviour!

okvik commented
dmorn commented

No, 9p servers are independent entities tasked with providing their and only their file tree(s) which can be incorporated, or mounted into "the namespace".

Thank you @okvik for the clarification 😊 I'm still new to plan9.

dmorn commented

Thank you all for your precious help! I'll implement flexi using the "use another mountpoint for the workers" approach then. If you are interested in the project, you'll find it at https://github.com/jecoz/flexi, just in case 😊

dmorn commented

Hey there, I've got some updates and I think it is relevant to continue the discussion here, feel free to stop me if you don't think so. We (our company) need to drop the fuse driver dependency as docker containers seek it from their outer environment, making it difficult for us to distribute the workers on some cloud providers. Do we want to discuss a little bit on the alternative solutions we proposed above?

From how I see it, the cleaner approach would be to create a styx client implementation (within this library) that can be used programmatically to forward fs functions calls to an io.ReadWriteCloser (which might be a TCP connection or a posted service file on the local namespace, i.e. a unix socket under 9port). What do you think @marzhall @droyo @okvik ?

Please also have a look at how the flexi.FS is used in conjunction with the FSHandler, a styx.Handler implementation. I think we might really take advantage of this abstraction within the library itself somehow