RedHat-EMEA-SSA-Team/hetzner-ocp4

Playbook fails on Centos 8 Streams - ERROR! couldn't resolve module/action 'virt_net'.

rbo opened this issue ยท 27 comments

rbo commented
ansible-playbook ./ansible/setup.yml
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed
 from ansible-core in version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature will be removed in version 2.16. Deprecation warnings
 can be disabled by setting deprecation_warnings=False in ansible.cfg.
ERROR! couldn't resolve module/action 'virt_net'. This often indicates a misspelling, missing collection, or incorrect module path.

The error appears to be in '/root/hetzner-ocp4/ansible/roles/openshift-4-cluster/tasks/create-network.yml': line 98, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Define network {{ cluster_name }}
  ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"
rbo commented

Installed a bunch of ansible-galaxy modules:

ansible-galaxy collection install community.libvirt
ansible-galaxy collection install community.crypto
ansible-galaxy collection install community.general
ansible-galaxy collection install community.aws
ansible-galaxy collection install google.cloud
ansible-galaxy collection install community.azure
ansible-galaxy collection install kubernetes.core

Now fails with:

TASK [openshift-4-cluster : Define network ocp4] *******************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "The `libvirt` module is not importable. Check the requirements."}

rbo commented

Ansible can not load the libvirt python module by default

image

[root@ocp4 hetzner-ocp4]# ansible localhost -m virt -a command=list_vms -e 'ansible_python_interpreter=/usr/bin/python3'
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This
feature will be removed from ansible-core in version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.
[WARNING]: Skipping callback plugin 'profile_tasks', unable to load
localhost | SUCCESS => {
    "changed": false,
    "list_vms": []
}
[root@ocp4 hetzner-ocp4]# ansible localhost -m virt -a command=list_vms
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This
feature will be removed from ansible-core in version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.
[WARNING]: Skipping callback plugin 'profile_tasks', unable to load
localhost | FAILED! => {
    "changed": false,
    "msg": "The `libvirt` module is not importable. Check the requirements."
}
[root@ocp4 hetzner-ocp4]#
rbo commented

Ansible use /usr/bin/python3.8 and not the system default:

[root@ocp4 hetzner-ocp4]# /usr/bin/python3.8 -c "import libvirt"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'libvirt'
[root@ocp4 hetzner-ocp4]# /usr/bin/python3.6 -c "import libvirt"
[root@ocp4 hetzner-ocp4]#
rbo commented

Hot fix

Install modules:
ansible-galaxy collection install community.libvirt
ansible-galaxy collection install community.crypto
ansible-galaxy collection install community.general
ansible-galaxy collection install community.aws
ansible-galaxy collection install google.cloud
ansible-galaxy collection install community.azure
ansible-galaxy collection install kubernetes.core
Configure ansible_python_interpreter in your cluster.yml

Add

# Hot fix for https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/issues/205
ansible_python_interpreter: /usr/libexec/platform-python

to your cluster.yml

rbo commented

Next problem:

TASK [openshift-4-cluster : Select cluster & user] **************************************
fatal: [localhost]: FAILED! => {"msg": "You need to install \"jmespath\" prior to running json_query filter"}

I have found a workaround using Jinja's selectattr filter instead of JMESPath's json_query filter, like follows

- name: Select cluster & user
  set_fact:
    cluster: "{{ kubeconfig.clusters | selectattr('name','equalto','ocp4') | map(attribute='cluster') | first }}"
    user: "{{ kubeconfig.users | selectattr('name','equalto','admin') | map(attribute='user') |  first }}"

Next problem:

TASK [openshift-4-cluster : Create infra-registry pv] ***************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "kubernetes >= 12.0.0 is required"}

Fixed by:

# pip3 install -I kubernetes openshift

# pip3 list | grep kubernetes
kubernetes (22.6.0)

and the cluster is up and running

rbo commented

@snoussi thanks look good.

I also investigate to use an ansible execution environment: https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/tree/ansible-ee first test looks good:

$ podman run --rm -ti  --security-opt label=disable -v /run/libvirt/:/run/libvirt/ \
  -v /var/run/libvirt/:/var/run/libvirt/ \
  quay.io/redhat-emea-ssa-team/hetzner-ocp4-ansible-ee:devel bash
bash-4.4# virsh list
 Id   Name             State
--------------------------------
 2    demo-master-0    running
 3    demo-master-1    running
 4    demo-master-2    running
 5    demo-compute-0   running
 6    demo-compute-1   running

bash-4.4# ansible localhost -m virt -a command=list_vms
[WARNING]: No inventory was parsed, only implicit localhost is available
localhost | SUCCESS => {
    "changed": false,
    "list_vms": [
        "demo-master-0",
        "demo-master-1",
        "demo-compute-1",
        "demo-compute-0",
        "demo-master-2"
    ]
}
bash-4.4#

Long term goal might be to use ansible-navigator too. let's see...

@snoussi
Thanks a lot for the solution.
It works on my Hetzner-Server as well. Cool stuff. ๐Ÿ‘

Also failing on the firewalld module, missing from ansible-core:

TASK [openshift-4-cluster : Include OS specific part] ********************************************************************************************************
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks/import_playbook instead. This feature will be removed in version 2.16.         
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.                                                                    
fatal: [localhost]: FAILED! => {"reason": "couldn't resolve module/action 'firewalld'. This often indicates a misspelling, missing collection, or incorrect mo
dule path.\n\nThe error appears to be in '/home/manu/hetzner-ocp4/ansible/roles/openshift-4-cluster/tasks/prepare-host-CentOS-8.yml': line 38, column 3, but m
ay\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Allow NFS traffic from VM's to Host\n  ^
 here\n"}       

was fixed with

ansible-galaxy collection install ansible.posix

The workaround for selectattr is not completely correct is it hardcodes the cluster name.
Correct one should look like:

cluster: "{{ kubeconfig.clusters | selectattr('name','equalto',cluster_name) | map(attribute='cluster') | first }}"                                     
user: "{{ kubeconfig.users | selectattr('name','equalto','admin') | map(attribute='user') |  first }}"
rbo commented

Problem is fixed with the change to ansible execution environment (#207) in Devel will be merge into Master with #212

Please checkout the the devel tree.

In short:

  1. Install ansible navigator
  2. Checkout devel branch
  3. Run playbooks: ansible-navigator run -m stdout ./ansible/setup.yml
rbo commented

Please get me feedback if the ansible execution environment works for you!

So I tried with the ansible execution env using latest devel two remarks:

  • we need to add a special flag if the playbook is to be run on the hettzer server itself as a root user ( seems to be the recommeded option according to readme ? )
ansible-navigator run -m stdout  ./ansible/setup.yml --connection=local

otherwise we get the error

TASK [Gathering Facts] *********************************************************                                                                             
fatal: [host]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: root@localhost: Permission denied (publickey).", "unreachable": true}
  • later on firewalld is failing on with
TASK [openshift-4-cluster : Enable & Start firewalld] **************************
fatal: [host]: FAILED! => {"changed": false, "msg": "Could not find the requested service firewalld: host"}
rbo commented

@EmmanuelKasper thanks for testing. Do to you configure ssh properly as mentioned in the documentation: https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/tree/devel#initialize-tools

The --connection: local is interesting but I changed the whole behaviour that it works over an ssh connection. This means the ansible-ee connects to host which is basically localhost: https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/blob/devel/inventory/hosts.yaml

With this change, it is technically possible to run the playbooks against a remote host.

Hope it helps, if not feel free to ping me directly.

I also have this issue on RHEL 8.6.

In the process of testing these workarounds on RHEL 8.6. So far so good.

Yes, I can confirm that by downgrading ansible components and applying the fixes here my RHEL 8.6 box is perfectly happy to deliver my test OCP clusters without any issues at all. So it is really not just a Centos stream issue.

rbo commented

@tomazb thanks, yes we have also a couple of problems on rhel 8, that's why I introduced ansible execution environment to hetzner-ocp4. I assume I will merge ansible execution environment into master next couple days/weeks.

I was able to install a cluster on a RHEL 8.6 box by using the latest devel-branch with some additional modifications on the virt-modules usage. Using the ansible execution-environment (EE) works perfect for me now, but only as a standard-user. Running the EE as root is failing.

rbo commented

I was able to install a cluster on a RHEL 8.6 box by using the latest devel-branch with some additional modifications on the virt-modules usage. Using the ansible execution-environment (EE) works perfect for me now, but only as a standard-user. Running the EE as root is failing.

It looks like this is related to AAP 2.2 : #220

On my fresh installed 8.6 & AAP 2.1 it works very well.

Fresh CentOS 8 Stream installation in Hetzner. Had to do next steps:

ansible-galaxy collection install community.libvirt
ansible-galaxy collection install community.crypto
ansible-galaxy collection install community.general
ansible-galaxy collection install community.aws
ansible-galaxy collection install google.cloud
ansible-galaxy collection install community.azure
ansible-galaxy collection install kubernetes.core

#add cluster.yml
ansible_python_interpreter: /usr/libexec/platform-python

pip3 install -I kubernetes openshift
pip3 install boto3

#hetzner-ocp4/ansible/roles/openshift-4-cluster/tasks/build-k8s-vars.yml
#Select cluster & user -task
cluster: "{{ kubeconfig.clusters | selectattr('name','equalto',cluster_name) | map(attribute='cluster') | first }}"
user: "{{ kubeconfig.users | selectattr('name','equalto','admin') | map(attribute='user') | first }}"

Next problem:

TASK [openshift-4-cluster : Select cluster & user] **************************************
fatal: [localhost]: FAILED! => {"msg": "You need to install \"jmespath\" prior to running json_query filter"}

to fix this I have installed the missing library with the following command:

/usr/bin/pip3.8 install -I jmespath

There are a lot of issue coming from mixing different versions/users when installing modules.
I solved this by forcing the python interpreter for the execution and making sure all modules are installed for the right user (jmespath, libvirt-python that is required for libvirt ansible module, etc).
MIgrating to an EE mitigates for what concerns the 'local' modules, libvirt-python must be enabled on the VM host itself and not on the EE.

@suulperi your build-k8s-var changes also fixed setting up NFS.

rbo commented

@suulperi do you tried the ansible ee in devel tree? #205 (comment)

@rbo No didn't. I was way too busy because of demo session. I will try it as soon as possible.

rbo commented

Ansible execution env. & ansible-navigator changes merged into master with PR #212

The issue is solved with the new solution based on ansible-navigator. Please checkout new usage:

New usage:

  • Install ansible navigator & configure ssh
  • Run playbooks: ansible-navigator run -m stdout ./ansible/setup.yml