jpetazzo/pipework

Can this be used across VMs?

Closed this issue · 7 comments

So if I have VM1 and VM2. VM1 is hosting apache and VM2 is hosting MYSQL as containers. Do I just need to add a route between the bridges? I am trying to figure out how to keep Containers in their own "VLAN".

If you want to connect containers running on multiple VirtualBox VMs on the same machine

No problem. I would suggest to:

  • define an "internal network"
  • connect each VM to it (a new eth1 interface should show up in each VM)
  • do something like pipework br1 <containerid> <ipaddr/subnet> in each VM, for each container
  • do brctl addif br1 eth1 in each VM
  • ... that's it, your containers should be able to communicate with each other!

On servers with Xen, KVM, ... just adapt the "internal network" concept (it's probably under a different name) but it will work the same.

If you want to connect containers running on multiple VMs on different physical machines

It will be slightly different. The best scenario is when you have an extra, available, physical eth1 interface on each physical machine; all connected together.

Then, instead of creating an "internal network", you configure each physical machine:

  • brctl addbr br1 to create a bridge
  • brctl addif br1 eth1 to add eth1 interface
  • ifconfig eth1 up to activate interface

Now for each VM:

  • create a "host only interface"; it will appear as an extra eth1 in the VM, and something special in the host (e.g. vboxnet0 for the first VM, vboxnet1 for the next one...)
  • add the eth1 in the VM to the br1 in the VM, so that containers "sit" on the bridge
  • add the vboxnet0 (or whatever it's called) to the br1 in the physical machine

... Voilà, all containers are now on a shared Ethernet network!

If you have different groups of containers with different VLANs, it gets more complicated, but I can provide instructions as well.

Let me know if that works for you; I would like to add this to a kind of "advanced use cases" document (since the main README is getting rather huge!)

Thank you.

I've been looking at some similar use cases. In particular, I'm thinking of "worst case" scenarios with minimal control over the hardware, e.g. Amazon EC2, Rackspace, Linode etc. These providers do allow for private networking between VMs, but I'm not sure that they would support large numbers of containers appearing in that IP space.

I would like to use Docker to be able to run identical copies of the same connected set of services (running in containers) in multiple environments - my local desktop, local test environments in which I have control of the hardware, remote test environments in which I do not, and cloud hosting environments for production. I'm totally happy with setting up a homebrew PaaS (until Docker gains more features, or something like Deis or Flynn matures) but the ability to network things together in cases where I don't have control of the hardware is a stumbling block. I could hack something together based on port forwarding and lots of proxies, but that just feels wrong.

Is there a reasonably universal and moderately simple way of creating a VLAN across different cloud hosted VMs? Since Docker is running on each VM, it feels like this is something Docker (or a plugin) could help to manage.

(I'm very much a developer and not a network engineer, so feel free to correct me on anything).

@robknight: unfortunately, there is no "universal and moderately simple" method to connect containers in a public cloud environment with no layer 2 access.

We had that discussion yesterday with @mpetazzoni; and my temporary conclusions were:

  • if you want to build a layer 2 network across VMs, you need to use some kind of encapsulation (tunneling or VPN), which requires O(N²) links (to connect all VMs together), since no broadcast/multicast will be available;
  • if you want to build a layer 3 network, you need an addressing plan, and a routing protocol.

Both options are of course technically feasible, but they are far from being simple, unfortunately.

The future availability of TRILL might help with the layer 2 scenario. A container running a routing daemon like quagga might help with the layer 3 scenario. Still, both cases will be very hairy for people without a strong network background :-(

@robknight, looks like we have a very similar use case. I'd be interested in discussing it more with you and figuring out a solution. What I've come up with so far is the following:

  • All containers are passed as an environment variable the IP address of their host.
  • All the ports exposed by containers are mapped 1:1 on the host
    • This means that if you run the same container twice on the same host, they need to run on different ports just like if you didn't have containers ... this means you loose some of the benefits of the containers being in different networking namespaces
    • It also requires that your containers can accept the port(s) the service they contain is supposed to use, and a way for that service's configuration to be dynamically created/modified based on these ports. I pass them as environment variable and each of my containers has a run script that prepares the configuration before exec()-uting the service
  • When a container needs access to the service of another container, it needs to be passed the IP address of the host of that target container (and the correct port of course)
  • For service that advertise themselves to ZK for service discovery, they need to do so exposing the IP address of their host, which is available in their environment.

On top of all of this, I am putting together a little bit of orchestration, unfortunately custom built. I haven't found anything simple enough that also knew how to deal with Docker containers. Basically this orchestration layer takes a description of an environment (basically the list of containers that I want, what host they run on, what image they use, their environment variables, etc) and it spits out the correct Docker commands to start them.

For this I have my Docker daemons listening on TCP (not just the Unix socket) so I can remotely control them. And you also need to make sure all your hosts (I use host regardless of whether it's an actual machine or a VM that hosts containers) can talk to each other. If you run multiple VMs on your dev machine for example, you would need:

  • Docker running inside each of them
  • Have all of them configured for "private networking" (also known as "host only") and place them all in the same subnet. If you use Vagrant for example, it's as simple as adding:
config.vm.network "private_network", ip: "192.168.1.X"

For each of them, where X is != 1, as .1 is reserved for the new network interface that will be created on your machine.

I'm just building this this week and it's (very) slowly coming together. I really hope there was some better orchestration that dealt specifically with Docker containers. The new links feature of 0.6.5 is cool, but still very much useless right now in a multi-host/multi-VM configuration.

HTH

@mpetazzoni it sounds like we're doing something quite similar. I've been working on a very basic orchestration system which uses the Docker API to coordinate the building/pulling of images in order to provide the containers required for certain services. It uses a configuration file which looks something like this for a typical LAMP stack application:

services:
  apache:
    image: registry:5000/my-apache-image
    depends: [php]
    ports: [80]
  php:
    image: registry:5000/my-php-image
    depends: [mysql]
    ports: [9000]
  mysql:
    image: registry:5000/my-mysql-image
    ports: [3306]
hosts:
  - docker1:5422
  - docker2:5422

This means that the 'apache' service needs to know where the 'php' service is, and the 'php' service needs to know where the 'mysql' service is, and this could be injected via environment variables or some similar mechanism. The 'hosts' are Docker hosts (VM or physical, it doesn't matter) - I haven't implemented any logic for splitting the containers across different hosts though, mostly because I'm trying to figure out how the networking would work.

I can think of two ways to do this:

Option 1 Pretty much as you described. Environment vars are inserted into containers to tell them where the containers that they depend on can be found. We can know all of this before deployment, assuming that we know the IP addresses of the hosts and we know which ports the containers will be using on those host machines, so there are no problems with having to deploy containers in a specific order.

I'm a bit unhappy about this solution because the containers will need to have some way of re-writing their configuration files depending on the environment vars. For example, if Apache has some configuration file telling it the IP/port of the PHP server, this will need to be modified. And since each layer of the stack has its own configuration files, you'll need to spend a lot of time writing scripts to modify different types configuration files (consider that the config that tells PHP where MySQL is could be pretty much anything - arbitrary "settings.php" files which declare such things as PHP variables are commonplace). This is what a Drupal settings file looks like:

$databases = array(
  'foobar' => array(
    'default' => array(
      'username' => 'qwerty',
      'host' => 'databasehost',
      'driver' => 'mysql',
      'database' => 'db123',
      'password' => 'asdfgh'
    )
  ),
);

We could easily change this to use environment variables directly because it's actually executable PHP code, but that wouldn't work for .ini, .yaml, .xml or other types of configuration file, and various application stacks use different files. Sometimes relevant configuration is stored in a database. Sometimes services need to be restarted after configuration files change, or some other post-processing may need to happen (clearing of caches which may include references to the old configuration). I feel that this compromises on some of the benefits of 'immutability' in respect of container configuration.

What I would prefer is something more like this:

Option 2

With a bit of scaffolding, we could use DNS and some kind of port forwarding/proxying. In my above example, the 'mysql' service would get the hostname 'mysql', and it would have this name in all environments, so the config files which refer to it never need to change. If we're doing everything on a single Docker host, this is pretty easy - we can generate IP addresses for each service, assign them using pipework, and generate an /etc/hosts file which can be inserted into the containers (or we could use dnsmasq on the host, or dnsmasq running in each container). No environment variables, no config rewriting and no forwarded ports. We could even store the hosts file on a data volume that could be shared between the containers, so we don't need to insert anything into the containers themselves.

The problem comes when we have more than one Docker host. My main problem is that my networking-fu is much weaker than my dirty-hack-fu, so my imaginary solution looks something like this:

  1. Figure out which containers we're going to run and assign an arbitrary IP to each one (it probably makes sense to put containers running on different hosts on different subnets, but I'm not sure it actually matters much)
  2. Figure out which hosts these containers will run on (again, this can be arbitrary; let's assume some algorithm which attempts to provide redundancy or load-balancing characteristics)
  3. Generate a hosts file, which would look something like this:
192.168.1.2 mysql # runs on docker1
192.168.1.3 php # runs on docker1
192.168.2.2 apache # runs on docker2
  1. On each host, use pipework to create the interfaces required. Then create additional interfaces for each "remote" container, and assign them to a special container which has some kind of networking magic capable of forwarding this traffic to a counterpart on the remote host, which forwards them on to the correct container.

I appreciate that the phrase 'some kind of networking magic' is doing a lot of the work in this description. I suspect it could be done using nothing more than iptables, if we assume that the containers on our docker hosts are actually reachable via exposed ports. Let's say that docker1 and docker2 are actually running on ec2, and they can talk to each other over a private network - docker1 has the private IP 10.0.1.20 and docker2 has the private IP 10.0.1.30. Now let's say that apache, running on docker2, wants to talk to php on docker1. The IP address for PHP is 192.168.1.3, which is routed to a local container. This container just needs to know that any traffic it receives on 192.168.1.3:9000 needs to be routed to whatever PHP's exposed port on docker1 is, and I think that this can be achieved with basic iptables rules. I just haven't got around to trying it yet, and it's a sufficiently complicated idea that I thought I should try asking around for better ideas first.

To be honest, the main thing that option 2 gives you is that it reduces the amount of stuff that the ordinary containers need to know about. You can build a container on your local desktop and use the exact same container image in all other environments, without ever modifying anything inside it. It could be totally read-only and doesn't even depend on environment variables, and this appeals to the slightly more autistic part of my personality. All of the complexity is in the networking container. In the long term, the networking container could do some smarter things related to security, isolation, failover or proxying, but I don't really have enough real-world devops knowledge to imagine all of the possible scenarios. However, perhaps I should be a bit more wary of the idea of a magic black box that does some crazy networking stuff because if it ever stopped working it would be very difficult to debug.

Option 1, however, doesn't need the additional container to do network routing stuff, and maybe the cost of inserting environment variables and rewriting config files is lower than I imagine. It also has the advantage of being achievable with technology we already have today, and doesn't require any particular networking knowledge.

I suspect that I'm missing something though, and maybe the whole thing could be simplified somehow. Either way, this is an interesting problem and solving it would open up a lot of very interesting possibilities.

You're pretty much nailed it down nicely. I actually have a (strangely) similar YAML description for my environments from which I generate the docker run commands.

About option 1, you're right, there is some overhead to creating an image in the fact that you need some custom logic for each of them that knows how to (re)write the service's configuration based on environment variables. Unfortunately I don't think you have much choice about this in certain situations. Take the example of ZooKeeper being part of your deployed environment. When doing development, you'll most likely run a single-node instance with one container. In production though, you'll be running a multi-node cluster. The difference is that ZooKeeper's configuration in each situation is actually different, and needs to know the host:port of the other nodes. Because of this, you can't have fully read-only/never modified container images. The same holds true for a few other cases.

The overhead of doing this, I found, is actually not that bad. Especially for YAML-based configurations for example, where you can just build a Python dict and yaml.dump() the heck out of it. For others, some simple templating work really well too.

About option 2, I feel like it's overall a more complex solution. Between generating IP addresses, to figuring out the routing between hosts, etc... I feels harder to bring up and to debug when it doesn't work. My network-fu is clearly very weak too, so maybe I just have a hard time picturing how the pieces would fit together. But like you, I'm not too fond of the fact that all communication then relies on that magic black box, which in a way becomes a single point of failure.

Alright; I might eventually provide additional docs (and side-kick containers) to help with option 2 (i.e. building a network across multiple Docker hosts), but I think we can close this issue :-)

People willing to use DNS for discovery might want to look at SkyDNS: https://github.com/skynetservices/skydns

Let me know if you have further questions or remarks!