docker/compose

Restart/Reconnect containers connected via 'network_mode: service' automatically when main service is restarted

DavHau opened this issue ยท 31 comments

Is your feature request related to a problem? Please describe.
When running the following docker-compose.yml:

version: "3.7"

services:
  
  mother:
    image: alpine
    command: "sleep 999999"
    restart: always

  child:
    image: alpine
    command: "sleep 888888"
    network_mode: "service:mother"

If the mother container is restarted for any reason (crash / manual restart), the child container loses its network forever.

$ docker-compose restart mother
$ docker-compose exec child ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

The child container is fully disconnected from the world. It will not reattach to mother's network. It will be unable to communicate with other containers and the internet. Does it make any sense at all to continue running the child container in this state?

Describe the solution you'd like
Whenever a service is restarted which has other services connected to it via 'network_mode: service', then reconnect those other services or restart them if reconnecting is technically unfeasible.

Describe alternatives you've considered
A workaround using a healthcheck and autoheal is described here: #6329 (comment)

In discussions of other issues related to 'network_mode: service' it is suggested to use a user defined network instead. But as far as i know there are container compositions which require 'network_mode: service', for example when putting multiple containers behind a vpn. Please correct me if I'm wrong.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AFAIK this is still an issue.

stale commented

This issue has been automatically marked as not stale anymore due to the recent activity.

when you use "service" network_mode (i.e. sharing network namespace between containers), loosing connectivity on restart is really the expected behaviour. Comparable to using "host" network and getting the node shut down and service restarted elsewhere on cluster.

Such usage only makes sense for highly coupled containers (typically: containers in a kubernetes Pod) but not for services communicating together in a reliable way. Automatically restarting the dependent service would help you hide the networking constraints of your architecture but this is just cheating, better get your architecture to embrace the risk for dependent service being restarted or sacled up/down. For this purpose, use your compose file to define an explicit network connecting services together.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ndeloof

when you use "service" network_mode (i.e. sharing network namespace between containers), loosing connectivity on restart is really the expected behaviour.

Intuitively i would not call this expected behaviour. I give you some real world examples: In my home network if my network connection is dependent on a cable being plugged into my machine and i plug this cable out and then back in, I expect my machine to reconnect. Or if my network connection is dependent on some other machine, i.e. my router, and i restart that machine, i expect my network to be back up again after restarting that machine

Comparable to using "host" network and getting the node shut down and service restarted elsewhere on cluster.

I agree to this comparison as it demonstrates how useless such kind of behaviour is. This is why you would never configure your cluster in a way to run a service without a vital resource being present. And therefore i think it would be a good idea to also stop doing that in docker compose. When using "service" network_mode, the services are highly coupled, so that one cannot live without the other one. In a mother child configuration the child is strongly dependent on mother and it never makes sense to have the child running without a mother. There is no single good reason why you would not also stop the child if mother is gone / or cannot reunite with the child after being restarted.

For this purpose, use your compose file to define an explicit network connecting services together.

As i already stated in the original issue, there are some container configurations where creating an explicit network is not sufficient and instead you have to share the network adapter itself. For example forcing any kind of container to connect via a VPN container. Therefore your suggestion doesn't solve the problem.

stale commented

This issue has been automatically marked as not stale anymore due to the recent activity.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue has been automatically closed because it had not recent activity during the stale period.

Amazing this issue still exists

I agree with @DavHau. There should be at least an option to make this behaviour possible.

Now that compose is transitioning to v2, maybe it would be worth to check this issue again? @ndeloof

There are quite a lot of usecases where automatically restarting the child's network stack is quite useful, as explained above. As of now, child containers will literally be deprived of all network connectiviy once the container providing the network stack dies.

The healthcheck workaround provided in the first comment is a rather brutal and completely ineffective approach, since child containers do not need to be brutally restarted as the only thing failing is their network stack, not the service itself nor whathever the container provides. Once the mother (or network-providing container) is restarted the network stack from child containers should be restarted/updated as well, thus avoiding brutal, and (hopefully not) taxing and long reloads for important services.

To make things worse I've seen quite a lot of images using healthchecks that only do probes internally. If some container offers some service at localhost:8000, chances are it's just using plain curl -f localhost:8000, which won't fail even if the container providing the network stack fails. This wouldn't be too much of an issue if they did something like curl -f localhost:8000 && curl -f google.com, but I for one don't support the idea of restarting completely fine-working containers just because their network stack malfunctioned for a brief moment.

if child depends on mother service, like defined here by network_mode (but could also be by any other shared namespace, as well as explicit depends_on) it would make sense to me restarting mother service would restart all the dependent services. (pull requests are welcome on v2 :P)

That being said, to connect services together you might better define a network to be shared between services. The only scenario I can imagine to require shared network namespace is for one of the service to access the other as localhost without the ability for you to change this behavior.

The only scenario I can imagine to require shared network namespace is for one of the service to access the other as localhost without the ability for you to change this behavior.

opening this issue again as the described situation is the one I am in

For another use case for a feature like this, check out the Gluten project (https://github.com/qdm12/gluetun).

It's a VPN container that routes all the network traffic in the namespace through a VPN tunnel. So, any containers connected via the network_mode:container-name have their network traffic routed through the tunnel. This is great for applications which do not support proxy routing at the application level.

A feature that allows the child network to be re-connected automatically if the parent is restart would be fantastic for those containers that are dependent on gluetun for a secure connection to somewhere else.

I want to push this.

In my situation i use an vpn container and connect serval other containers via network_mode: "container:mycontainer"
sometimes i have to restart the VPN, to change the server, or just for maintenance. And after that, i have to manually restart all the child containers. I know, that i can write everything to the same compose file, but then i lose flexability.

A good behavior would be an option like:
restart: on_network
And then the child container restarts, if it loses the network connection. In the next Step this check shuld be done in configurable intervals, to prevent countless container restarts.

Kind Regards

Could we keep this issue open?

melyux commented

Can someone reopen this issue?

gionag commented

+1

network_mode implies an explicit depends_on between services, and as such the "mother" service does already restart the depending services:

$ cat compose.yaml 
services:
  mother:
    image: nginx
  app:
    image: nginx
    network_mode: "service:mother"

$ docker compose up -d
[+] Building 0.0s (0/0)                                    docker:desktop-linux
[+] Running 3/3
 โœ” Network chose_default     Created                                       0.0s 
 โœ” Container chose-mother-1  Started                                       0.0s 
 โœ” Container chose-app-1     Started                                       0.0s 
$ docker compose restart mother
[+] Restarting 2/2
 โœ” Container chose-mother-1  Started                                       0.3s 
 โœ” Container chose-app-1     Started                                       0.0s 

if this is not what you get, please open a new issue with details on your configuration

gionag commented

just tested, and if i restart the mother, in my implementation, doesn't trigger a restart on the child...

@gionag did you tried my example? Which version of compose are you running?

As far as I understand the reasoning of the others they might mean that in case of a container crash (restart: always) or something similar (like manual docker container restart) the children aren't restarted. A restart only happens with the explicit compose restart command.

network_mode implies an explicit depends_on between services, and as such the "mother" service does already restart the depending services:

$ cat compose.yaml 
services:
  mother:
    image: nginx
  app:
    image: nginx
    network_mode: "service:mother"

$ docker compose up -d
[+] Building 0.0s (0/0)                                    docker:desktop-linux
[+] Running 3/3
 โœ” Network chose_default     Created                                       0.0s 
 โœ” Container chose-mother-1  Started                                       0.0s 
 โœ” Container chose-app-1     Started                                       0.0s 
$ docker compose restart mother
[+] Restarting 2/2
 โœ” Container chose-mother-1  Started                                       0.3s 
 โœ” Container chose-app-1     Started                                       0.0s 

if this is not what you get, please open a new issue with details on your configuration

Tested it and if I need idk, update the mother container with a newer image, add some environment variable, recreate the container (with the same name), I need to attach the mother network again (i'm using portainer)

Obviously this only applies when compose recreate the mother container. Any other scenario where user re-create container or container restart after a crash isn't managed by Compose

Obviously this only applies when compose recreate the mother container. Any other scenario where user re-create container or container restart after a crash isn't managed by Compose

Hence the bug. If we think this doesn't belong in compose, then a bug in the docker runtime should probably track it?

@ndeloof since you're part of the docker organization, where do you think this should be tracked? At the end of the day I think we all want to see this bug/feature fixed/implemented.

I think part of the disconnect here is on what the purpose of docker-compose is. If I'm interpreting your comment correctly, compose is only intended to reconcile what's actually running in the docker runtime when it's directly invoked.

Others potentially expect the conditions & restrictions that are specified in a compose file to be used to continuously reconcile the state the container runtime is in. E.g. if something causes the docker runtime to put a container out of its intended state, then compose kicks in and reconciles the changes.

Maybe what we're asking here for is runtime continuous dependencies between different containers, vs at-the-time-of-command dependencies between the container definitions.

If we think this doesn't belong in compose, then a bug in the docker runtime should probably track it?

Definitively not under compose scope as long as events don't take place under its control.
I also don't think this should be reported to docker runtime: as you replace a resource, invalidating those which depends on it, it is your responsibility to manage the reconciliation. This is what compose offers when you use up to recreate container.
Is there any reason you want to do this on your own ?

Would love this as well. I have a VPN container and all dependents on it lose network if this container is restarted/crashed etc.

@Fossil01 engine is not aware of relation between services declared in compose, so it can't manage such a "cascade" restart.

It probably should be aware of such things.

@melyux this should be discussed on github.com/moby/moby
my 2 cents: engine already manages restart policy "on failure", maybe it could also manage shared-namespace source being restarted