Stuck in state "Pending"
JonasKs opened this issue · 18 comments
Hello,
I have just created a docker swarm from scratch, and I am not able to run the visualizer, since its stuck in pending.
$ docker service ls && docker service ps viz
ID NAME MODE REPLICAS IMAGE PORTS
ylpdkfqjv0o8 viz replicated 0/1 dockersamples/visualizer:latest *:9090->8080/tcp
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
j7z2l72rlf46 viz.1 dockersamples/visualizer:latest Running Pending 4 minutes ago
Docker version:
$ docker version
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Experimental: false
apt-cache policy docker-engine
docker-engine:
Installed: 17.05.0~ce-0~ubuntu-xenial
Candidate: 17.05.0~ce-0~ubuntu-xenial
Same setup on 17.03 works, so I suspect the 17.05 is not supported yet?
are you publishing the correct port?
Same setup work on same setup, but with older docker swarm, so yeah, I believe I do. As shown in docker service ls:
*:9090->8080/tcp
I think this is a Swarm problem, sometimes I have had to delete the node and created it again so it can start a pending service.
@JonasKs you said on an older docker swarm? You do you mean docker in swarm mode or the older docker swarm pre-Docker 1.12 version? The visualizer won't work on the older pre-1.12 version only Swarm mode.
@ManoMarks Docker version 17.03 worked for me, but not 17.05. I can provide a full docker version
on working setup tomorrow.
Created it by doing this:
ubuntu@manager-project2:~$ docker service create \
> --name=viz \
> --publish=9090:8080/tcp \
> --constraint=node.role==manager \
> --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
> dockersamples/visualizer
mcej4y4vluvtstdv86lbistni
Since --detach=false was not specified, tasks will be created in the background.
In a future release, --detach=false will become the default.
ubuntu@manager-project2:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
mcej4y4vluvt viz replicated 0/1 dockersamples/visualizer:latest *:9090->8080/tcp
ubuntu@manager-project2:~$ docker service ps viz
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
pgvuxlqd4008 viz.1 dockersamples/visualizer:latest Running Pending about a minute ago
Now its only been pending for about a minute, but last one was pending for 24 hours before I stopped it.
EDIT: Gave a more complete description of each step.
ubuntu@dswarm00:~$ docker version
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 11:02:43 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 11:02:43 2017
OS/Arch: linux/amd64
Experimental: false
Is from the working setup.
I have this problem too.
I deploy visualizer according to docker get started tutorial and it will be stuck in pending:
docker stack services getstartedlab
ID NAME MODE REPLICAS IMAGE PORTS
53074ayqiv8s getstartedlab_web replicated 3/3 zhangjinjie/getstartedlab:latest *:80->80/tcp
m21qgmsf7bmk getstartedlab_visualizer replicated 0/1 dockersamples/visualizer:stable *:8080->8080/tcp
my docker version:
docker version
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Tue Mar 28 00:40:02 2017
OS/Arch: darwin/amd64
Server:
Version: 17.03.1-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: c6d412e
Built: Fri Mar 24 00:00:50 2017
OS/Arch: linux/amd64
Experimental: true
I know the reason that visualizer will use npm to install dependencies which will spend a lot of time and look like pending but actually works normally.
All,
I am also having services being stuck for more than 15 mins when they should last from seconds to a couple of mins. It started when I upgraded to
docker-ce | 17.06.0ce-0ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages (which I added the edge).
I would create a service and the 'service ls' command would show it as '0/1' for a long time, which contradicted what the 'node ps' and 'service ps' was showing which was 'running'. I would go to the node it was supposed to be running on but 'docker ps' and 'docker ps -a' showed nothing, like it never was created. I thought maybe some hidden process is happening. So I waited and waited, beyond the amount it should take. It's very confusing to have the 'service ps' and 'node ps' indicating it's running when you find nothing on the node running.
This happened on 3 different services and I never had issues with (using mongo, redis and internal one images). Sometimes when I did the 'service rm' or 'docker system prune' on the node it would help but it would start acting up again. I tried changing out the --detach=true and --detach=false, it didn't matter.
I came across this open ticket and it described exactly what was happening to me.
I tried to downgrade to the stable by removing the edge. Same problem. I tried to downgrade to 17.03.2 and same problem. I decided to go down to the version indicated by JonasKS that worked. Bingo, everything was behaving like it originally was.
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64
Experimental: false
I was using --batch=false during the 'service creation' and I could watch the output of the build from Jenkins and I guess it was going through a verification process of the task. First it was pretty quick during this process. Then I did some updates and kicked off another build and this time that verification process was taking a very long time. I killed the build because I didn't want to bog downing the logging. Then I did some reading on --batch and from my understanding you can stop it and it would still continue in a background state depending on how you wanted to use it. I tried the --batch=true and it would stop all that logging verification and the build finished. I noticed the service would stay in 0/1 and after a while it would be 1/1. This told me it was doing the verification in the background and it would eventually finish.
The services I build are pretty quick and 5 mins is the max time. But I began seeing it going past 15 mins like it was stuck in the verification process. I would do 'service ps' and it would indicate the service was running, though 'service ls' displayed 0/1. I then did a 'node ps ', the node the 'service ps' said it was running on. The 'node ps' displayed it was running. I would then go to the node and do a 'docker ps' and nothing showed. I would do another 'docker ps -a' and nothing either. Somewhere something is stuck. Then I removed the service and tried again. It would sometimes work, sometimes didn't. This was going on for about a couple of days to which I was getting frustrated and decided to search if anyone else had a similar issue.
I then decided to downgrade the docker version until I got to the one that worked as I mentioned earlier.
You said you ran into an open ticket, where was that ticket?
This ticket and then another one similar https://github.com/moby/moby/issues/33807 , which referred on how difficult it is to know what is happening during the process. I commented on this one (#68) because it was more similar to my issue.
Here is a pic of his progress:
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Waiting 1 seconds to verify that tasks are stable...
For me I would get a 0/1 in progress and it would go on and on and on and on scrolling the page for almost 15 mins for a job that maxes at 5 mins. My Jenkins job only keeps a certain number so I don't have it anymore to paste here. I can't duplicate it because I downgraded to a Docker version that doesn't give me the issue, and it's a version that doesn't use the --detach on the 'service create'.
I'm unable to replicate and it looks like this is not an issue with this app but a moby issue. So I'm closing this issue.
tldr;
solution (use the stable tag):
docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer:stable
I was able to reliably reproduce this.
Host:
Ubuntu 16.04
Docker installed via these instructions: https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/
ruby-2.4.2 ~$ docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:53 2017
OS/Arch: linux/amd64
Experimental: false
ruby-2.4.2 ~$ uname -a
Linux hostname 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
ruby-2.4.2 ~$ lsb_release -cs
xenial
ruby-2.4.2 ~$ id
uid=1000(jesse) gid=1000(jesse) groups=1000(jesse),...,133(docker)
Guests (6 node swarm, 3 managers, 3 workers)
docker@manager1:~$ uname -a
Linux manager1 4.4.108-boot2docker #1 SMP Wed Dec 27 23:32:59 UTC 2017 x86_64 GNU/Linux
docker@manager1:~$ docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:05:38 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:12:29 2017
OS/Arch: linux/amd64
Experimental: false
docker@manager1:~$ docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
##### NOTE: there's some cut and paste here, but this is the basic
##### sequence, though the IDs probably don't all line up. Pasting here for
##### completeness, but be wary :-)
docker@manager1:~$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c81bce9114c dockersamples/visualizer:latest "npm start" 47 seconds ago Up 42 seconds (health: starting) 8080/tcp viz.1.cwy1nu7u5x1dvl87paayiq6dt
docker@manager1:~$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c81bce9114c dockersamples/visualizer:latest "npm start" About a minute ago Up About a minute (unhealthy) 8080/tcp viz.1.cwy1nu7u5x1dvl87paayiq6dt
docker@manager1:~$ docker service ps viz
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
rgu6gk0z0wcj viz.1 dockersamples/visualizer:latest manager1 Running Starting less than a second ago
yoinijp8rj6m \_ viz.1 dockersamples/visualizer:latest manager1 Shutdown Complete 10 minutes ago
xv8j7ctxgbbe \_ viz.1 dockersamples/visualizer:latest manager1 Shutdown Complete 12 minutes ago
vdr699auedno \_ viz.1 dockersamples/visualizer:latest manager1 Shutdown Complete 19 minutes ago
ytchsm7d8hvz \_ viz.1 dockersamples/visualizer:latest manager2 Shutdown Complete 24 minutes ago
#### Some service logs in case that is useful, again cut and paste, be wary
#### this is the basic repeating sequence in the logs...
docker@manager1:~$ docker service logs viz
viz.1.vdr699auedno@manager1 | npm info lifecycle swarmVisualizer@0.0.1~poststart: swarmVisualizer@0.0.1
viz.1.vdr699auedno@manager1 | npm info ok
viz.1.q1w3tryhqpvt@manager1 | npm info it worked if it ends with ok
viz.1.q1w3tryhqpvt@manager1 | npm info using npm@5.3.0
viz.1.q1w3tryhqpvt@manager1 | npm info using node@v8.2.1
viz.1.q1w3tryhqpvt@manager1 | npm info lifecycle swarmVisualizer@0.0.1~prestart: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1 | npm info lifecycle swarmVisualizer@0.0.1~start: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1 |
viz.1.q1w3tryhqpvt@manager1 | > swarmVisualizer@0.0.1 start /app
viz.1.q1w3tryhqpvt@manager1 | > node server.js
viz.1.q1w3tryhqpvt@manager1 |
viz.1.q1w3tryhqpvt@manager1 | npm info lifecycle swarmVisualizer@0.0.1~poststart: swarmVisualizer@0.0.1
viz.1.q1w3tryhqpvt@manager1 | npm info ok
viz.1.q6cbfgnjs1ud@manager2 | npm info it worked if it ends with ok
viz.1.q6cbfgnjs1ud@manager2 | npm info using npm@5.3.0
viz.1.q6cbfgnjs1ud@manager2 | npm info using node@v8.2.1
viz.1.q6cbfgnjs1ud@manager2 | npm info lifecycle swarmVisualizer@0.0.1~prestart: swarmVisualizer@0.0.1
viz.1.q6cbfgnjs1ud@manager2 | npm info lifecycle swarmVisualizer@0.0.1~start: swarmVisualizer@0.0.1
viz.1.q6cbfgnjs1ud@manager2 |
viz.1.q6cbfgnjs1ud@manager2 | > swarmVisualizer@0.0.1 start /app
viz.1.q6cbfgnjs1ud@manager2 | > node server.js
Image info
docker@manager1:~$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
973a12246699 dockersamples/visualizer:stable "npm start" 29 minutes ago Up 29 minutes 8080/tcp viz.1.15bltdtfzjddzvwf3i2pmrlxt
docker@manager1:~$ docker container inspect --format '{{ .Config.Image }}' 973a12246699
dockersamples/visualizer:stable@sha256:bc680132f772cb44062795c514570db2f0b6f91063bc3afa2386edaaa0ef0b20
docker@manager3:~$ docker container ls
### This is the one that keeps failing.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f1b4f79e62e2 dockersamples/visualizer:latest "npm start" 28 seconds ago Up 28 seconds (health: starting) 8080/tcp vizBad.1.sx33nd0t5kmzq0o14dmd86xjl
docker@manager3:~$ docker container inspect --format '{{ .Config.Image }}' f1b4f79e62e2
dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e
or maybe this is better?
docker@manager2:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dockersamples/visualizer latest 03aa6f3e43b4 3 weeks ago 153MB
dockersamples/visualizer stable 8dbf7c60cf88 5 months ago 148MB
Yea the problem is the newer node-based healthcheck @ManoMarks I've had the same issue when it's a service, the service eventually fails. The HC always exits 1 so it won't cause issues in docker run, but you'll see it marked unhealthy eventually. As a service, it will recreate itself after 30s or so, which may not be noticed until someone looks at docker events or service ps history.
I'll dig in this weekend and see how to fix healthcheck.
service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
docker service ps viz
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ooyk54naneyj viz.1 dockersamples/visualizer:latest linuxkit-025000000001 Running Starting 8 seconds ago
nglubswi5cf5 \_ viz.1 dockersamples/visualizer:latest linuxkit-025000000001 Shutdown Complete 13 seconds ago
301a9jeiutcy \_ viz.1 dockersamples/visualizer:latest linuxkit-025000000001 Shutdown Complete about a minute ago
kf577tbbc7wc \_ viz.1 dockersamples/visualizer:latest linuxkit-025000000001 Shutdown Complete 3 minutes ago
lnxkrv8i8sza \_ viz.1 dockersamples/visualizer:latest linuxkit-025000000001 Shutdown Complete 31 minutes ago
docker events
2017-12-29T15:59:23.306820374-05:00 container exec_create: /bin/sh -c node healthcheck.js || exit 1 942ee4deecc85f26d2183ea462d88fd221bf363b181cb34801910653d8e0617c (com.docker.swarm.node.id=jpbc36i0269bw5pi9bcfd5eha, com.docker.swarm.service.id=q4b55ufsaz0sxye0yhx4m9ntw, com.docker.swarm.service.name=viz, com.docker.swarm.task=, com.docker.swarm.task.id=ik21gw5zfvm6kg3znc5s3vyxl, com.docker.swarm.task.name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl, image=dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e, name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl)
2017-12-29T15:59:23.307154557-05:00 container exec_start: /bin/sh -c node healthcheck.js || exit 1 942ee4deecc85f26d2183ea462d88fd221bf363b181cb34801910653d8e0617c (com.docker.swarm.node.id=jpbc36i0269bw5pi9bcfd5eha, com.docker.swarm.service.id=q4b55ufsaz0sxye0yhx4m9ntw, com.docker.swarm.service.name=viz, com.docker.swarm.task=, com.docker.swarm.task.id=ik21gw5zfvm6kg3znc5s3vyxl, com.docker.swarm.task.name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl, image=dockersamples/visualizer:latest@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e, name=viz.1.ik21gw5zfvm6kg3znc5s3vyxl)
OK inspect was showing that node couldn't find healthcheck.js
for some reason:
{
"Start": "2017-12-31T21:23:14.536991044Z",
"End": "2017-12-31T21:23:14.849570942Z",
"ExitCode": 1,
"Output": "module.js:487\n throw err;\n ^\n\nError: Cannot find module '/healthcheck.js'\n at Function.Module._resolveFilename (module.js:485:15)\n at Function.Module._load (module.js:437:25)\n at Function.Module.runMain (module.js:605:10)\n at startup (bootstrap_node.js:158:16)\n at bootstrap_node.js:575:3\n"
}
Putting in a relative path didn't help, but hardcoded full path worked. PR coming.
@BretFisher Got the same issue: docker inspect
showing Cannot find module '/healthcheck.js' on a service. I use
dockersamples/visualizer@sha256:868b83d5fb2aff628f6dfaeb081b0e0124cdaf06354bf252d92d878b75193f7e