haxorof/ansible-role-docker-ce

Centos8: Issues when trying to install plugins

janorn opened this issue · 9 comments

Version Information

Ansible: 2.9.14
Role: 3.1.1

Steps to Reproduce

Install docker with the following variable set.

docker_plugins:
  - type: authz
    alias: opa-docker-authz
    name: openpolicyagent/opa-docker-authz-v2:0.4
    args: opa_args="-policy-file /opa/policies/authz.rego"

Expected Behavior

Install complete and docker running.

Actual Behavior

TASK [haxorof.docker_ce : Start Docker daemon] *********************************
task path: /tmp/packer-provisioner-ansible-local/5fa40e28-8833-1f16-ef18-0e1b3a93b684/roles/haxorof.docker_ce/tasks/configure-docker/configure-docker-plugins.yml:4
skipping: [127.0.0.1] => {"changed": false, "skip_reason": "Conditional result was False"}

TASK [haxorof.docker_ce : Wait for Docker daemon to started] *******************
task path: /tmp/packer-provisioner-ansible-local/5fa40e28-8833-1f16-ef18-0e1b3a93b684/roles/haxorof.docker_ce/tasks/configure-docker/configure-docker-plugins.yml:14
FAILED - RETRYING: Wait for Docker daemon to started (10 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (9 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (8 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (7 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (6 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (5 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (4 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (3 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (2 retries left).
FAILED - RETRYING: Wait for Docker daemon to started (1 retries left).
fatal: [127.0.0.1]: FAILED! => {"attempts": 10, "changed": false, "cmd": "docker info", "delta": "0:00:00.092221", "end": "2020-11-05 14:45:00.318418", "msg": "non-zero return code", "rc": 1, "start": "2020-11-05 14:45:00.226197", "stderr": "errors pretty printing info", "stderr_lines": ["errors pretty printing info"], "stdout": "Client:\n Debug Mode: false\n\nServer:\nERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?", "stdout_lines": ["Client:", " Debug Mode: false", "", "Server:", "ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"]}

This seems to be caused by this:

TASK [Debug] ***********************************************************************************************************
ok: [127.0.0.1] => {
    "ansible_facts.services['docker.service']": {
        "name": "docker.service",
        "source": "systemd",
        "state": "running",
        "status": "enabled"
    }
}

References

- name: Start Docker daemon
become: yes
service:
name: docker
state: started
when:
- ansible_facts.services['docker'] is defined
- ansible_facts.services['docker'].state is defined
- ansible_facts.services['docker'].state != "running"

I can't see any harm in just dropping the when statement. Docker needs to be running to install plugins.
And service is idempotent so if docker is running already nothing will happen.

Did you do similar to what I did here in my test? https://github.com/haxorof/ansible-role-docker-ce/blob/master/tests/experimental/cis/test_cis.yml

It is a bit complicated when installing the authz and have this special logic around it when you for example install Docker for the first time because then I need to start Docker first without the authz config to be able to download it via the Docker daemon. The after that I can configure it and do a restart.

I can try running this test again this weekend and see if that still works.

I been running with this plugin all the time. But it seems i haven't stumbled on this in my current systems as they already have a running docker. The new check actually doesn't work properly in systemd environments due to the added suffix ".service" in the ansible_facts.services.

Thanks! I will look into this weekend then.

Just as note to this issue. Changes related to #127 caused this bug.

Why do we need this code:

when: (
ansible_facts.services['docker'] is defined and
ansible_facts.services['docker'].state is defined and
ansible_facts.services['docker'].state != "running"
) or
(
ansible_facts.services['docker.service'] is defined and
ansible_facts.services['docker.service'].state is defined and
ansible_facts.services['docker.service'].state != "running"
)

When we need the docker daemon to run to install plugins?

This code is only called if there is plugins to be installed:

- name: Install and configure Docker plugins
include_tasks: configure-docker/configure-docker-plugins.yml
when: docker_plugins | length > 0

As ansible service is idempotent. Nothing will happen if the service is already running.

started/stopped are idempotent actions that will not run commands unless necessary.
restarted will always bounce the service.
reloaded will always reload.

The issue with the plugins is that it needs a Docker daemon to be run with a working configuration for the plugins to be installed. Then different Linux distributions behaves differently (or did in the past) when just enabling the service (not starting it):

- name: Enable Docker service
become: true
service:
name: docker
enabled: yes
notify: restart docker
register: _docker_service

When you the first install the authz plugin then you are not allowed to have that configuration in /etc/docker/daemon.json the first time Docker daemon is started because you need to install it before it can be configured. So that is what the extra include related to docker plugin do.

  1. It ensure the daemon is started if not already since some distributions do start it when I actually just telling it to enable it.
  2. Then is starts without the plugin configuration to be able to install the plugins (for example authz)
  3. Then reconfigure it with the authz plugin and notify the handler to restart the daemon.

At that time I did that I could not find a better way to do this reconfiguring and restarting all because the Docker daemon itself must be installed to download plugins.

The when clauses only limits the service module from starting the service if it isn't running. However the service module is idempotent and will not restart the service if it is running. So it would be safe to drop the when clauses. It should behave exactly the same.

Hence we really don't need the when clause. If we came this far we want the daemon to be running to be able to install the plugins in the following tasks.

Or am I missing something in this particular task file?

Might be that you are not missing something 😄
I cannot remember exactly why I really did it like that but if was for a reason which seems now to be a bit odd now when you pointing at it. Going to write a ticket where I have to review the logic around plugins and discussion about that can continue there.