docker/machine

Proposal: Driver Plugins

ehazlett opened this issue · 44 comments

This proposal is to start a discussion (finally) around Machine driver plugins. We have discussed this in passing before, but due to the influx in driver PRs we have pushed this up and have this slated for 0.5.

Background

Docker Machine offers an interface for creating drivers. This allowed for relatively easy driver creation which caused great feedback and response with drivers working with varying hypervisors and cloud providers. However, at this time it is proving to be extremely difficult for us to keep up with reviewing and testing each of these drivers for inclusion in the Machine core. We really want to switch to a more pluggable model, as well as polish up a few things about the driver model which need to be changed to ensure a smooth and sustainable future.

Implementation

To re-iterate, this is meant to be the start of a discussion. I'm just putting ideas here as starting blocks.

Plugin Directory

Machine Driver plugins will be stored in a known directory (i.e. ~/.docker/machine/plugins). For the first iteration, Machine will simply list the available files in the directory and use this list as the available plugins.

Binaries

Each plugin will be a binary. This makes it easy to distribute and execute.

Communication

Plugin communication would happen via stdin and stdout as a JSON stream. We will need to design the spec but something like this:

{
  "action": "start",
  "machine": "dev",
  "args": null
}

This would be sent as stdin to the plugin. The plugin would respond with something like:

{
  "exit_code": 0,
  "errors": [],
  "output": [
    "Machine created successfully"
  ]
}

Distribution

For the first iteration we would not include a distribution mechanism. We would maintain a list on the wiki of available drivers and their location, maintainer, etc.

To play devil's advocate for a moment, would a full fledged API (and spec) be better than distributing binaries and relying on people to get the stdin/out communication implemented properly?

To play devil's advocate for a moment, would a full fledged API (and spec) be better than distributing binaries and relying on people to get the stdin/out communication implemented properly?

Could you elaborate a bit more on how you envision this working?

Something I thought of about driver plugins today: I think that we (the Docker Machine team) should maintain some kind of "template" / "boilerplate" repo for developing a driver plugin. It would include a Makefile, Dockerfile, etc. and instructions on how to develop and integrate your plugin. That way, we could have problems like cross-compilation (and possibly Github release automation etc. which might be a pain for others to manage manually) solved out of the box for plugin authors.

Plugin communication would happen via stdin and stdout as a JSON stream.

My gut reaction to this was "why not use unix domain sockets?" but I think I do like stdin/stdout as one of the simplest possible implementations. The key is going to be making sure we handle garbage input gracefully.

For the first iteration we would not include a distribution mechanism.

Hmmm... While I agree with the sentiment, this is going to make it significantly more difficult for new users of Machine to get started than it is now. I'm not sure it'd be that difficult to implement some kind of docker-machine plugin get github.com/exoscale/docker-machine-driver type of experience using the GitHub releases API and some a few conventions around binary name, etc...

But, I haven't thought deeply about it yet - there may be some significant roadblocks that I haven't considered ;)

Also - I'm assuming this design means driver plugins can be written in any language, or is there a reason we should restrict support to drivers written in Go?

My gut reaction to this was "why not use unix domain sockets?"

The issue here is mainly Windows. I like Docker's "auto-detect if there's a socket in a certain directory" model, but I don't think UNIX sockets work on Windows, and using TCP sockets introduces all kinds of fun new complications ("How do I get an open port and communicate what that is" etc.).

I'm not sure it'd be that difficult to implement some kind of docker-machine plugin get github.com/exoscale/docker-machine-driver type of experience using the GitHub releases API and some a few conventions around binary name, etc...

I have a feeling this will be trickier than it looks, for a variety of reasons. It's a lot of moving parts and we want to get the local experience relatively solid before introducing distribution to the model. Distribution will be much easier to add in than to change or take out, IMO.

Also - I'm assuming this design means driver plugins can be written in any language, or is there a reason we should restrict support to drivers written in Go?

Depending on the model we end up going with, maybe. If we support a pure STDIN/STDOUT model, it seems theoretically feasible. However, all of the wrappers and UX niceness around development, will be Go-oriented (e.g. "Just fulfill this interface and the libmachine dependency will handle boilerplate demarshalling etc. for you!").

but I don't think UNIX sockets work on Windows

Right... Pesky old Windows 😉

using TCP sockets introduces all kinds of fun new complications ("How do I get an open port and communicate what that is" etc.).

Communicating open ports isn't so hard - either the parent process can open the port before execing the child process, and then communicate that as a command line flag, or the child process can report its port on stdout.

However, I do agree that stdout/stdin is the simplest. 😄

we want to get the local experience relatively solid before introducing distribution to the model

Ok, that's fair.

perhaps @jhowardmsft and @ahmetalpbalkan can chime in for additional ideas on the Windows front?

I would like to propose that you look into the architecture of Packer's plugins system.

See for some docs see docs - plugins and docs - developing plugins.

To summarise the code/architecture a bit:

The main benefit of designing it this way is that it gives a Go interface to implement agains with static type checking etc and also probably makes it much easier to handle messages and debug logging in a uniform and good way.

PS. I think it is extremely beneficial to quickly get a simple distribution system up and running. Something as simple as:

docker-machine plugin install parallells/machine-parallels

as soon as there is a usable plugin system.

So far @rickard-von-essen's idea sounds good and can support Windows too (where Unix sockets are missing). I think implementing a wire-protocol RPC would be an extra thing to worry about whereas we can implement a HTTP server that has a JSON API very easily.

In such a setup, what will go wrong in the long run is versioning, as Driver interface changes, the API/RPC interface will need to keep up.

To address that, we can provide a “stub/base implementation” of a Machine Driver API Server in Go and by keeping the driver interface closely tied to the Go API Server interface, and we can even ensure stuff like type safety in those (e.g. missing methods) ––and even possibly API versioning like /v1/....

+1 for a need of distribution system. My comment on that is, a path like ~/.docker/machine/plugins is rarely a good one to store binaries (e.g. if you switch to sudo from your user, you'll need to install the plugin again).

Are there any proposals about how the arguments of each driver will be discovered and exposed on machine create?

I also think stdin/stdout is simplest.
Then, for large-scale docker machine deployment, could docker/libchan be an option?

I'd like to echo @timfallmk here, if the issue is a lack of time and resources to do code reviews and managing the volume of contributions, why not ask for volunteers to focus on this part of the project? I'd be happy to review and get PRs into good shape before handing off to the maintainers to merge, I'm sure there are other folks who would also help out.

This seems like adding complexity to the project in an effort to save effort, but may end up just being as much work (or more) in the end. Right now there are a lot of activity around provider drivers, but I suspect that would tail off relatively quickly. libcloud, which is arguably the most comprehensive cloud driver library, only has 102 compute drivers by my count, which is a lot, but not unmanageable. Trying to coordinate compiled binaries and/or some other plugin system seems unwieldy and more difficult to manage, not less.

From a UX perspective, one of the advantages to docker machine currently is you can just drop the binary on your box and go from there, would this mean you'd now have to always enable a plugin for the providers you are using and then compile every time? Seems confusing and cumbersome for the end user. I'd love to hear from someone using docker downstream like @ibuildthecloud at Rancher to get their perspective on it, maybe they don't care.

I guess I don't see an argument that the current model is broken from an architecture standpoint, so why not just continue to refine and define the driver interface?

In any case, happy to contribute to whatever system ultimately is decided on, it's not that much code or time from my perspective, but this seems like not a great solution for the stated problem. If a different system is a requirement, then I'd say the one suggestion that seems the best so far (but probably also the most work) is @rickard-von-essen's but even then, seems like a lot of complexity for not a huge amount of upside IMO.

@crunchywelch At Rancher we actually want plugins. The problem we have at the moment is that we have to bundle a new docker-machine binary if somebody wants a new driver. Imagine I'm a big enterprise company with a complicated internal provisioning process. These companies have expressed the willingness to write a Docker machine driver that works for their internal cloud. Obviously nobody outside that company cares about that plugin. What we would love is if somebody could install Rancher then register their custom plugin. We could make it so that they could register a new docker-machine binary but then the issue is that our internal QA is now invalidated because they might be on a different machine version.

You are right in that packaging and distribution becomes a problem now. That is solvable and a nice UX could invented.

The OpenStack Nova team went through a similar issues in that the core team got overloaded with drivers. The problems are just practical in that most people who can dedicate a large amount of time to maintaining an open source project usually are doing it as apart of their job. Not a lot companies see value in dedicating resources to reviewing and merging drivers for other vendors. You either work on the core, or you create a driver. I could be wrong, but that is just what I've seen.

Here's what I propose. First, this totally sucks for all those people who had PRs in that just got closed. I understand why, I'm sure the docker-machine team feels bad, but still sucks. Designing a framework like this can take a long time and enviably we will get it wrong on the first shot or we spend forever rolling it out. The fact that all these drivers are held up and not making it into machine directly impacts me and I'd really like to find the fastest route to move forward and just quickly iterate if its bad. As opposed to an ivory tower design.

I'd like to build off of @rickard-von-essen idea and what Docker is already doing. We have a Driver interface today, so lets just create a remote implementation of that driver that does a simple REST (really RPC) interpretation of the API. Then machine can provide the server side framework that will translate the HTTP calls back to the Driver interface. Then most of the work already done in the existing PRs should be usable. As @tianon mentioned we could use net/rpc. It think that's a stop gap as we probably want the HTTP API to be first class, but it could be useful to just get going.

As @nathanleclaire mentioned Machine can also provide a template/boilerplate project so it's pretty close to what we have to today (add code to this folder, update godeps, etc). The binaries produced should just be placed in a ~/.docker-machine/plugins folder and then docker-machine can itself manage the execution of the drivers.

Certain drivers could probably be packaged in the main binary, and maybe we have a process to promote drivers to that status. But the ability to iterate outside of a the machine release schedule would be nice.

Thanks @ibuildthecloud, exactly the feedback I was hoping to get! I guess I'm more convinced about the plugin architecture with this in mind, I hadn't considered private ifx deployments before, makes total sense though. I also agree if we did a net/rpc implementation of the current interface definition that would be a good first step to see what we love / hate and would be a nice way to move in that direction toward a more ideal implementation (or who knows, maybe it works great and we keep it).

I also like the idea of having a certain base set of drivers available as long as ours is included in that set ;) A little joking aside, as long as we have a clear set of criteria and process for service providers to have their drivers reviewed and included in the base set so as to not play favourites then that sounds great to me and solves the UX issues I was concerned about.

fwiw, I'm not really upset about having our PR closed, we all know this is a moving target, it happens. I also know the docker team is doing what they think is best for the project long-term and acting out of best intentions, and I'm glad they opened up this discussion to everyone!

So, what are next steps? Is there a spec we can collaborate on, or a fork we can start a PoC against?

miqui commented

Hi from irc:

so what can we do to accelerate gh#1626? can we have a machine summit and get people together to quickly hash out a design and plan for execution?

So, what can we do ?

We could host a group in NYC if we wanted to do a video conference, not sure how many folks are here in NY.

For those interested, #1626 is an earlier proposal for implementing plugins, and may contain some interesting thoughts / suggestions (als on distribution)

thanks everyone for the feedback. i like @ibuildthecloud idea of fast iteration -- we will almost certain get something wrong and at least something quick to get the functionality needed would be a huge plus. i'm cool with doing something along the net/rpc line until we see what we like / don't like. and fwiw, it does suck but i think for the best. it will be so much better to remove us as a bottleneck for these awesome drivers :)

i'm also a huge +1 for a "machine summit" perhaps late next week? we have the 0.4 release next week so this right after would be a great start IMO. if this sounds good, i'll get a hangout scheduled.

Doh! Previous comment should have linked to #500 (thanks for the catch Ankush)

@ehazlett, given these comments and the scope of the project, do you still think this is feasible for a 0.5 release? (Perhaps that's more easily/accurately answered after the hangout.)

@ehazlett sounds good, looking forward to the hangout!

miqui commented

any chance the driver design summit be in San Francisco?

@ehazlett With libcompose (shameless plug for newly created awesome project libcompose) we have this exact same need for a plugin framework for a CLI. We should really share code. So as soon as you guys are ready I'm chomping at the bit to develop something here.

+1 for @ibuildthecloud proposal as well, love the thread, thanks!

miqui commented

guys...any new movement? plans?

@miqui I'm not in SF. It would be best to do remote as most are probably not based in SF.

@ibuildthecloud sharing with libcompose would be awesome. i would love if you got something together -- would probably be lightyears ahead of me :)

+1 for virtual meetup on the topic, @miqui and i have a driver we'd like to propose for HP that fits here.

Just throwing a small stone into the machine, as I came across Heka
They use GO and LUA as plugin languages, which may help add more plugin support :)

Here is the first (of hopefully monthly) Docker Machine community hangouts: https://plus.google.com/events/cs0iir01kd9ac2df7kv1k2uslts (Aug. 20 @ 3p EDT)

We will be discussing this as a main topic.

miqui commented

@ehazlett thanks!! +1

It appears we will have more people joining than hangouts supports (awesome!!). We will use BlueJeans instead (this supports up to 100 participants). Thanks!

https://bluejeans.com/399850491

miqui commented

@ehazlett awesome!! +1

As a possible starting point, we could steal terraform's RPC-based plugin model...

@hairyhenderson +1 on starting there. FWIW, I'm waiting on this PR for Packet support in terraform, it was pretty easy to port our machine driver to that:

hashicorp/terraform#2260

For anyone interested in this, my progress on driver plugins can be seen here: nathanleclaire#3

I'd love to start getting some people tinkering with it, so if you're interested in writing a plugin, let me know (nathan@docker.com is best) and additionally I'll start putting together a little guide.

It is based on the libmachine work so getting that merged in the next week or so is high priority for me.

  • N

Has there been any talks about Docker/Docker Machine plugin hosting on DockerHub? Or anything like a centralized list of plugins?

Has there been any talks about Docker/Docker Machine plugin hosting on DockerHub? Or anything like a centralized list of plugins?

We've talked about a few things in this regard. In general, what we are doing is:

  1. For the immediate future, the Machine repo (in a Markdown file to be determined) will include a list of available plugins that anyone can fork and add theirs too if desired.
  2. Longer-term, investigating the possibility of running some plugins in containers (which would allow us to use Docker Hub for distribution).

the possibility of running some plugins in containers (which would allow us to use Docker Hub for distribution).

...with the obvious caveat that this could present a catch-22 scenario... ;)

...with the obvious caveat that this could present a catch-22 scenario... ;)

Indeed -- it doesn't make much sense to do it for local virtualization providers for a variety of reasons. They will likely always be binaries or similar.

Now that #1902 has landed, should this be closed?

dmp42 commented

\o/
Close!

🎉