Visualizer displays wrong info if previous tasks failed

Question

Visualizer displays wrong info if previous tasks failed

kobenauf opened this issue 6 years ago · 6 comments

Description

If I deploy a stack that fails or is rejected, Visualizer displays node labels as [object Object] and service names as "undefined". (See screen shot.)

If I run docker stack ps my-service I see:

DESIRED STATE       CURRENT STATE           ERROR
Running             Failed 19 minutes ago   "starting container failed: OC…"

Because the job had failed, I had also run docker stack rm my-service.

I suspect Visualizer is seeing these old failed tasks rather than filtering to only ones whose Current State is Running. Note the Desired state is Running (even though the actual Current State is failed), so maybe Visualizer is looking at that by mistake?

Steps to reproduce the issue, if relevant:

Somehow deploy a stack that is rejected or failed. (Sorry, don't know how to force this on purpose.)
Observe results in Visualizer.

Describe the results you received:

Describe the results you expected:
I expected Visualizer to continue displaying data correctly as it normally does.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      18.05.0-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   f150324
 Built:        Wed May  9 22:16:13 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   f150324
  Built:        Wed May  9 22:14:23 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 14
 Running: 2
 Paused: 0
 Stopped: 12
Images: 596
Server Version: 18.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 601
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 3qlfxy2gdse810z07h13f8jdu
 Is Manager: true
 ClusterID: d7kfhfklhe93g59p0l1utgoy1
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 1
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 198.199.98.120
 Manager Addresses:
  159.65.195.193:2377
  198.199.98.120:2377
  209.97.152.211:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-23-generic
Operating System: Ubuntu 18.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 985.5MiB
Name: m7-sf-1
ID: MTL3:66XA:Z2ZR:TMRA:HVX4:NNAQ:JGQI:FNFQ:D4TT:K6RE:J4KM:3Q4X
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 provider=digitalocean
Experimental: false
Insecure Registries:
 m7.code1.io:5000
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, Docker for Mac, Docker for Windows, VirtualBox, physical, etc.):
Digital Ocean cluster

Answer 1 · 2018-07-03T15:42:37.000Z

Incidentally, I was able to "recover" from this by one by one doing docker node demote xxx then docker node promote xxx for each of the three swarm managers.

The weird thing in this is even though I had done docker stack rm my-service, I could still see the failed tasks when I ran docker stack ps my-service. Normally when you remove a stack, you get nothing found in stack: my-service.

Anyway, whether this is a docker bug or expected somehow, Visualizer does not handle the condition well.

Answer 2 · 2018-09-17T13:04:55.000Z

I discovered same issue.

Answer 3 · 2018-10-18T03:58:03.000Z

Does anyone have answer for this issue? I have the same issue and don't know how to fix this. Thanks.

Answer 4 · 2019-01-24T08:42:24.000Z

+1 Swarm on GCP

Deploy command

docker stack deploy -c docker-stack.yml mystack --with-registry-auth

docker-stack.yml

version: '3.3'

volumes:
  postgres_data: {}
  portainer_data: {}

networks:
  overlay:

services:
  backend:
    image: <image>
    depends_on:
      - postgres
    volumes:
      - ./backend:/app
    command: /gunicorn.sh
    entrypoint: /entrypoint.sh
    env_file: .env
    networks:
      - overlay
    deploy:
      replicas: 2
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager]

  postgres:
    image: postgres:10-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - overlay
    deploy:
      placement:
        constraints: [node.role == manager]

  nginx:
    image: <image>
    ports:
      - "80:80"
    depends_on:
      - backend
    networks:
      - overlay
    volumes:
      - ./backend/media/:/media/
      - ./backend/staticfiles/:/staticfiles/
      - ./nginx/prod.conf:/etc/nginx/nginx.conf:ro
    deploy:
        mode: global

  backups:
    image: prodrigestivill/postgres-backup-local
    depends_on:
      - postgres
    volumes:
      - /tmp/backups/:/backups/

  portainer:
    image: portainer/portainer
    ports:
      - "9000:9000"
    command: -H unix:///var/run/docker.sock
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - portainer_data:/data

  visualizer:
    image::
      placement:
        constraints: [node.role == manager]

docker version

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:46 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

docker info

``` Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 38 Server Version: 18.09.0 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: active NodeID: x7fuf7bfhw974jc44xrzjj7bj Is Manager: true ClusterID: s6ferlvl5lx37kbtmap50unc1 Managers: 2 Nodes: 4 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 10.140.0.2 Manager Addresses: 10.140.0.2:2377 10.140.0.4:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39 runc version: 4fc53a81fb7c994640722ac585fa9ca548971871 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.9.0-8-amd64 Operating System: Debian GNU/Linux 9 (stretch) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 14.69GiB Name: fsm-1 ID: ZOLH:FFFI:DZKU:AHPS:FM5S:ZG3U:4O65:XIZ6:NG7O:7RB7:7MBT:TA2Q Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Username: froggyservice Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine ```

Answer 5 · 2019-01-24T13:31:59.000Z

@kobenauf is on to something. I was able to fix the issue by demoting and promoting only the manager node where visualizer was running. I think this works because it forces the visualizer to move to another manager.

I suspect this can also be fixed by updating the visualizer service and moving it to another manager node:

docker service update --constraint-add 'node.hostname==differentHost' stackname_visualizer

Answer 6 · 2019-03-01T12:45:51.000Z

You can also resolve the issue by restarting the docker service. In Linux I ran the following and once docker restarts the visualizer is fixed as well.

sudo service docker restart