dockersamples/docker-swarm-visualizer

Stuck in state "Pending"

JonasKs opened this issue · 18 comments

Hello,

I have just created a docker swarm from scratch, and I am not able to run the visualizer, since its stuck in pending.

$ docker service ls && docker service ps viz
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
ylpdkfqjv0o8        viz                 replicated          0/1                 dockersamples/visualizer:latest   *:9090->8080/tcp
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
j7z2l72rlf46        viz.1               dockersamples/visualizer:latest                       Running             Pending 4 minutes ago                       

Docker version:

$ docker version
Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64
 Experimental: false

apt-cache policy docker-engine
docker-engine:
  Installed: 17.05.0~ce-0~ubuntu-xenial
  Candidate: 17.05.0~ce-0~ubuntu-xenial

Same setup on 17.03 works, so I suspect the 17.05 is not supported yet?

are you publishing the correct port?

Same setup work on same setup, but with older docker swarm, so yeah, I believe I do. As shown in docker service ls:
*:9090->8080/tcp

I think this is a Swarm problem, sometimes I have had to delete the node and created it again so it can start a pending service.

@JonasKs you said on an older docker swarm? You do you mean docker in swarm mode or the older docker swarm pre-Docker 1.12 version? The visualizer won't work on the older pre-1.12 version only Swarm mode.

@ManoMarks Docker version 17.03 worked for me, but not 17.05. I can provide a full docker version on working setup tomorrow.

Created it by doing this:

ubuntu@manager-project2:~$ docker service create \
>   --name=viz \
>   --publish=9090:8080/tcp \
>   --constraint=node.role==manager \
>   --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
>   dockersamples/visualizer
mcej4y4vluvtstdv86lbistni
Since --detach=false was not specified, tasks will be created in the background.
In a future release, --detach=false will become the default.
ubuntu@manager-project2:~$ docker service ls 
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
mcej4y4vluvt        viz                 replicated          0/1                 dockersamples/visualizer:latest   *:9090->8080/tcp
ubuntu@manager-project2:~$ docker service ps viz
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
pgvuxlqd4008        viz.1               dockersamples/visualizer:latest                       Running             Pending about a minute ago                       

Now its only been pending for about a minute, but last one was pending for 24 hours before I stopped it.

EDIT: Gave a more complete description of each step.

@ManoMarks :

ubuntu@dswarm00:~$ docker version
Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64
 Experimental: false

Is from the working setup.

I have this problem too.

I deploy visualizer according to docker get started tutorial and it will be stuck in pending:

docker stack services getstartedlab

ID                  NAME                       MODE                REPLICAS            IMAGE                              PORTS
53074ayqiv8s        getstartedlab_web          replicated          3/3                 zhangjinjie/getstartedlab:latest   *:80->80/tcp
m21qgmsf7bmk        getstartedlab_visualizer   replicated          0/1                 dockersamples/visualizer:stable    *:8080->8080/tcp

my docker version:

docker version

Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Tue Mar 28 00:40:02 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:00:50 2017
 OS/Arch:      linux/amd64
 Experimental: true

I know the reason that visualizer will use npm to install dependencies which will spend a lot of time and look like pending but actually works normally.

All,

I am also having services being stuck for more than 15 mins when they should last from seconds to a couple of mins. It started when I upgraded to

docker-ce | 17.06.0ce-0ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages (which I added the edge).

I would create a service and the 'service ls' command would show it as '0/1' for a long time, which contradicted what the 'node ps' and 'service ps' was showing which was 'running'. I would go to the node it was supposed to be running on but 'docker ps' and 'docker ps -a' showed nothing, like it never was created. I thought maybe some hidden process is happening. So I waited and waited, beyond the amount it should take. It's very confusing to have the 'service ps' and 'node ps' indicating it's running when you find nothing on the node running.

This happened on 3 different services and I never had issues with (using mongo, redis and internal one images). Sometimes when I did the 'service rm' or 'docker system prune' on the node it would help but it would start acting up again. I tried changing out the --detach=true and --detach=false, it didn't matter.

I came across this open ticket and it described exactly what was happening to me.

I tried to downgrade to the stable by removing the edge. Same problem. I tried to downgrade to 17.03.2 and same problem. I decided to go down to the version indicated by JonasKS that worked. Bingo, everything was behaving like it originally was.

Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64

Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64
Experimental: false

@RAKedz which issue did you come across that described the problem?

I was using --batch=false during the 'service creation' and I could watch the output of the build from Jenkins and I guess it was going through a verification process of the task. First it was pretty quick during this process. Then I did some updates and kicked off another build and this time that verification process was taking a very long time. I killed the build because I didn't want to bog downing the logging. Then I did some reading on --batch and from my understanding you can stop it and it would still continue in a background state depending on how you wanted to use it. I tried the --batch=true and it would stop all that logging verification and the build finished. I noticed the service would stay in 0/1 and after a while it would be 1/1. This told me it was doing the verification in the background and it would eventually finish.

The services I build are pretty quick and 5 mins is the max time. But I began seeing it going past 15 mins like it was stuck in the verification process. I would do 'service ps' and it would indicate the service was running, though 'service ls' displayed 0/1. I then did a 'node ps ', the node the 'service ps' said it was running on. The 'node ps' displayed it was running. I would then go to the node and do a 'docker ps' and nothing showed. I would do another 'docker ps -a' and nothing either. Somewhere something is stuck. Then I removed the service and tried again. It would sometimes work, sometimes didn't. This was going on for about a couple of days to which I was getting frustrated and decided to search if anyone else had a similar issue.

I then decided to downgrade the docker version until I got to the one that worked as I mentioned earlier.

You said you ran into an open ticket, where was that ticket?

This ticket and then another one similar https://github.com/moby/moby/issues/33807 , which referred on how difficult it is to know what is happening during the process. I commented on this one (#68) because it was more similar to my issue.

Here is a pic of his progress:

overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Waiting 1 seconds to verify that tasks are stable...

For me I would get a 0/1 in progress and it would go on and on and on and on scrolling the page for almost 15 mins for a job that maxes at 5 mins. My Jenkins job only keeps a certain number so I don't have it anymore to paste here. I can't duplicate it because I downgraded to a Docker version that doesn't give me the issue, and it's a version that doesn't use the --detach on the 'service create'.

I'm unable to replicate and it looks like this is not an issue with this app but a moby issue. So I'm closing this issue.

tldr;
solution (use the stable tag):
docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer:stable

I was able to reliably reproduce this.
Host:
Ubuntu 16.04
Docker installed via these instructions: https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/

ruby-2.4.2   ~$ docker version
Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:09:53 2017
  OS/Arch:      linux/amd64
  Experimental: false
ruby-2.4.2   ~$ uname -a
Linux hostname 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
ruby-2.4.2   ~$ lsb_release -cs                                                                                                              
xenial
ruby-2.4.2   ~$ id
uid=1000(jesse) gid=1000(jesse) groups=1000(jesse),...,133(docker)

Guests (6 node swarm, 3 managers, 3 workers)

docker@manager1:~$ uname -a
Linux manager1 4.4.108-boot2docker #1 SMP Wed Dec 27 23:32:59 UTC 2017 x86_64 GNU/Linux
docker@manager1:~$ docker version
Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:05:38 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:12:29 2017
  OS/Arch:      linux/amd64
  Experimental: false
docker@manager1:~$ docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
##### NOTE: there's some cut and paste here, but this is the basic 
#####  sequence, though the IDs probably don't all line up.  Pasting here for
##### completeness, but be wary :-)
docker@manager1:~$ docker container ls
CONTAINER ID        IMAGE                             COMMAND             CREATED             STATUS                             PORTS               NAMES
0c81bce9114c        dockersamples/visualizer:latest   "npm start"         47 seconds ago      Up 42 seconds (health: starting)   8080/tcp            viz.1.cwy1nu7u5x1dvl87paayiq6dt
docker@manager1:~$ docker container ls
CONTAINER ID        IMAGE                             COMMAND             CREATED              STATUS                          PORTS               NAMES
0c81bce9114c        dockersamples/visualizer:latest   "npm start"         About a minute ago   Up About a minute (unhealthy)   8080/tcp            viz.1.cwy1nu7u5x1dvl87paayiq6dt
docker@manager1:~$ docker service ps viz
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE                     ERROR               PORTS
rgu6gk0z0wcj        viz.1               dockersamples/visualizer:latest   manager1            Running             Starting less than a second ago                       
yoinijp8rj6m         \_ viz.1           dockersamples/visualizer:latest   manager1            Shutdown            Complete 10 minutes ago                               
xv8j7ctxgbbe         \_ viz.1           dockersamples/visualizer:latest   manager1            Shutdown            Complete 12 minutes ago                               
vdr699auedno         \_ viz.1           dockersamples/visualizer:latest   manager1            Shutdown            Complete 19 minutes ago                               
ytchsm7d8hvz         \_ viz.1           dockersamples/visualizer:latest   manager2            Shutdown            Complete 24 minutes ago
#### Some service logs in case that is useful, again cut and paste, be wary
#### this is the basic repeating sequence in the logs...
docker@manager1:~$ docker service logs viz
viz.1.vdr699auedno@manager1    | npm info lifecycle swarmVisualizer@0.0.1~poststart: swarmVisualizer@0.0.1
viz.1.vdr699auedno@manager1    | npm info ok 
viz.1.q1w3tryhqpvt@manager1    | npm info it worked if it ends with ok
viz.1.q1w3tryhqpvt@manager1    | npm info using npm@5.3.0
viz.1.q1w3tryhqpvt@manager1    | npm info using node@v8.2.1
viz.1.q1w3tryhqpvt@manager1    | npm info lifecycle swarmVisualizer@0.0.1~prestart: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1    | npm info lifecycle swarmVisualizer@0.0.1~start: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1    | 
viz.1.q1w3tryhqpvt@manager1    | > swarmVisualizer@0.0.1 start /app
viz.1.q1w3tryhqpvt@manager1    | > node server.js
viz.1.q1w3tryhqpvt@manager1    | 
viz.1.q1w3tryhqpvt@manager1    | npm info lifecycle swarmVisualizer@0.0.1~poststart: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1    | npm info ok 
viz.1.q6cbfgnjs1ud@manager2    | npm info it worked if it ends with ok
viz.1.q6cbfgnjs1ud@manager2    | npm info using npm@5.3.0
viz.1.q6cbfgnjs1ud@manager2    | npm info using node@v8.2.1
viz.1.q6cbfgnjs1ud@manager2    | npm info lifecycle swarmVisualizer@0.0.1~prestart: swarmVisualizer@0.0.1
viz.1.q6cbfgnjs1ud@manager2    | npm info lifecycle swarmVisualizer@0.0.1~start: swarmVisualizer@0.0.1
viz.1.q6cbfgnjs1ud@manager2    | 
viz.1.q6cbfgnjs1ud@manager2    | > swarmVisualizer@0.0.1 start /app
viz.1.q6cbfgnjs1ud@manager2    | > node server.js

Image info

docker@manager1:~$ docker container ls
CONTAINER ID        IMAGE                             COMMAND             CREATED             STATUS              PORTS               NAMES
973a12246699        dockersamples/visualizer:stable   "npm start"         29 minutes ago      Up 29 minutes       8080/tcp            viz.1.15bltdtfzjddzvwf3i2pmrlxt
docker@manager1:~$ docker container inspect --format '{{ .Config.Image }}' 973a12246699
dockersamples/visualizer:stable@sha256:bc680132f772cb44062795c514570db2f0b6f91063bc3afa2386edaaa0ef0b20
docker@manager3:~$ docker container ls
### This is the one that keeps failing.
CONTAINER ID        IMAGE                             COMMAND             CREATED             STATUS                             PORTS               NAMES
f1b4f79e62e2        dockersamples/visualizer:latest   "npm start"         28 seconds ago      Up 28 seconds (health: starting)   8080/tcp            vizBad.1.sx33nd0t5kmzq0o14dmd86xjl
docker@manager3:~$ docker container inspect --format '{{ .Config.Image }}' f1b4f79e62e2
dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e

or maybe this is better?

docker@manager2:~$ docker images
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
dockersamples/visualizer   latest              03aa6f3e43b4        3 weeks ago         153MB
dockersamples/visualizer   stable              8dbf7c60cf88        5 months ago        148MB

Yea the problem is the newer node-based healthcheck @ManoMarks I've had the same issue when it's a service, the service eventually fails. The HC always exits 1 so it won't cause issues in docker run, but you'll see it marked unhealthy eventually. As a service, it will recreate itself after 30s or so, which may not be noticed until someone looks at docker events or service ps history.

I'll dig in this weekend and see how to fix healthcheck.

service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
docker service ps viz
ID                  NAME                IMAGE                             NODE                    DESIRED STATE       CURRENT STATE                 ERROR               PORTS
ooyk54naneyj        viz.1               dockersamples/visualizer:latest   linuxkit-025000000001   Running             Starting 8 seconds ago
nglubswi5cf5         \_ viz.1           dockersamples/visualizer:latest   linuxkit-025000000001   Shutdown            Complete 13 seconds ago
301a9jeiutcy         \_ viz.1           dockersamples/visualizer:latest   linuxkit-025000000001   Shutdown            Complete about a minute ago
kf577tbbc7wc         \_ viz.1           dockersamples/visualizer:latest   linuxkit-025000000001   Shutdown            Complete 3 minutes ago
lnxkrv8i8sza         \_ viz.1           dockersamples/visualizer:latest   linuxkit-025000000001   Shutdown            Complete 31 minutes ago
docker events
2017-12-29T15:59:23.306820374-05:00 container exec_create: /bin/sh -c node healthcheck.js || exit 1 942ee4deecc85f26d2183ea462d88fd221bf363b181cb34801910653d8e0617c (com.docker.swarm.node.id=jpbc36i0269bw5pi9bcfd5eha, com.docker.swarm.service.id=q4b55ufsaz0sxye0yhx4m9ntw, com.docker.swarm.service.name=viz, com.docker.swarm.task=, com.docker.swarm.task.id=ik21gw5zfvm6kg3znc5s3vyxl, com.docker.swarm.task.name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl, image=dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e, name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl)
2017-12-29T15:59:23.307154557-05:00 container exec_start: /bin/sh -c node healthcheck.js || exit 1 942ee4deecc85f26d2183ea462d88fd221bf363b181cb34801910653d8e0617c (com.docker.swarm.node.id=jpbc36i0269bw5pi9bcfd5eha, com.docker.swarm.service.id=q4b55ufsaz0sxye0yhx4m9ntw, com.docker.swarm.service.name=viz, com.docker.swarm.task=, com.docker.swarm.task.id=ik21gw5zfvm6kg3znc5s3vyxl, com.docker.swarm.task.name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl, image=dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e, name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl)

OK inspect was showing that node couldn't find healthcheck.js for some reason:

{
                        "Start": "2017-12-31T21:23:14.536991044Z",
                        "End": "2017-12-31T21:23:14.849570942Z",
                        "ExitCode": 1,
                        "Output": "module.js:487\n    throw err;\n    ^\n\nError: Cannot find module '/healthcheck.js'\n    at Function.Module._resolveFilename (module.js:485:15)\n    at Function.Module._load (module.js:437:25)\n    at Function.Module.runMain (module.js:605:10)\n    at startup (bootstrap_node.js:158:16)\n    at bootstrap_node.js:575:3\n"
                    }

Putting in a relative path didn't help, but hardcoded full path worked. PR coming.

@BretFisher Got the same issue: docker inspect showing Cannot find module '/healthcheck.js' on a service. I use
dockersamples/visualizer@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e