jpetazzo/pipework

Containers can lose their DHCP lease when dhclient is used

Closed this issue · 6 comments

As a result of 0f59dc4, dhclient is killed immediately after container start-up, which means that when the DHCP lease expires, the container will never renew it, opening up the IP for reassignment (and removing dynamically created DNS entries, if any). I'm not sure why only dhclient is being killed and the other DHCP clients aren't. Is it possible to not kill dhclient and instead solve the original contributor's problem some other way?

@rtkrruvinskiy We just hit exactly this problem, however in our case the issue manifested itself in much more implicit form.

We use containers for testing and simulation of a large scale distributed system. Some of the nodes run on platforms that one can not simulate in containers. So in our tests we have to use both containers and VMs and they should see each other on the same subnet without any NATs in-between.

We use pipework to create macvlan interfaces and drop them into container netns. Previously we were relying on pipework dhcp allocation. This resulted in IPs not getting renewed and DHCP server thinking that lease has expired. Now, many DHCP servers have a safety check - they ping supposedly free address before actually sending it over in DHCPOFFER. What happens here is that server receives DHCPDISCOVER from the client, picks a bad IP that it thinks is free, pings it, ping is successful, and server fails with an error "Abandoning IP ....", and, as a result, server completely ignores DHCPDISCOVER. From the client side it looks like there's no DHCP server, client enters exponential backoff, retries and with certain probability fails again.

Result: maximum DHCP allocation time grows exponentially relative to the ratio of allocated addresses to the subnet size. Some clients get DHCP instantly, but there're those that hit the problem. At some point DHCP allocation might even fail.

Temporary solution: increase the size of the subnet.

Permanent solution we ended up with:
Use pipework with macvlan interfaces without IP allocation, e.g. "pipework eth2 -i eth2 0/0", and then rely on container ENTRYPOINT to run dhclient and release the lease (which also stops the dhclient daemon) when entrypoint exits.

My solution to this problem has been to change my local pipework not to kill dhclient and to have each dhclient instance use a separate lease file to avoid corruption.

Interesting, and what happens when you remove container? For us removal
fails with "device is busy". Also dhclient daemons stay running after
container death. With entrypoint we essentially bind dhclient not only to
netns, but also to pidns. Do you release the lease somehow?

On Saturday, December 20, 2014, Ray Ruvinskiy notifications@github.com
wrote:

My solution to this problem has been to change my local pipework not to
kill dhclient and to have each dhclient instance use a separate lease file
to avoid corruption.


Reply to this email directly or view it on GitHub
#111 (comment).

For my use case, the containers never need to be shut down individually. They're actually not full-fledged containers, merely netns namespaces. These namespaces run in a VirtualBox VM. To change the configuration, I reload the VM. (There are Chef recipes that set up these namespaces for me when the VM reloads.) This is obviously not a general solution, but it lines up with our use case.

A more general solution would be to simply kill dhclient as part of container shutdown. I don't know what container abstraction you're using, but my guess is a wrapper that does this can be created for anything.

I see, makes sense. We use Docker.

2014-12-20 13:38 GMT-08:00 Ray Ruvinskiy notifications@github.com:

For my use case, the containers never need to be shut down individually.
They're actually not full-fledged containers, merely netns namespaces.
These namespaces run in a VirtualBox VM. To change the configuration, I
reload the VM. (There are Chef recipes that set up these namespaces for me
when the VM reloads.) This is obviously not a general solution, but it
lines up with our use case.

A more general solution would be to simply kill dhclient as part of
container shutdown. I don't know what container abstraction you're using,
but my guess is a wrapper that does this can be created for anything.


Reply to this email directly or view it on GitHub
#111 (comment).

The new DHCP mechanism should fix this, so I'm closing this issue. Thanks!