IDR/deployment

Reconsider systematic usage of update_cache in Ansible roles

Opened this issue · 3 comments

Noticed during the deployment of the software changes from #429 to prod122

As part of the migration of the OME Ansible roles to support RHEL9 started ~12 months ago, all usages of the built-in yum Ansible module have been replaced with the built-in dnf module. The update_cache parameter has been set to true across the board.

A consequence of this decision is a systematic and significant increase of the deployment time. As a minimum example, I executed the the idr-read-only.yml playbook against the test123 deployment in three consecutive runs.

With the current Ansible roles defined in ansible/requirements.yml, the playbook ran to completion in 6:02.90, 8:03.52 and 12:00.08.

I modified the Ansible roles downloaded locally via Galaxy to disable the cache update:

find vendor -type f -exec  sed -e "s/update_cache: true/update_cache: false/g" -i '' {} \;

With these changes, the playbook ran to completion in 2:35.17, 2:20.43 and 2:29.84 respectively.

As shown by the measurements above, the repeated calls to updating the cache for every DNF operation are causing a massive degradation in the execution times of our playbooks. While IDR has the most regular exposure due to the frequent deployments, this will affect anyone using the OME Ansible infrastructure on RHEL 9 including the UoD production deployments /cc @pwalczysko

Unless there is a rationale for keeping update_cache parameter and for make it configurable, my suggestion would be to remove it and release all the Ansible roles.

/cc @jburel @khaledk2

Since we're about to update all our roles, I think this could be a good opportunity to revisit and improve this behavior as well.
dnf has a built-in logic to expire its metadata cache automatically. By default, this expiration is set to 48 hours, meaning if the cache is older than two days, dnf will refresh it automatically (metadata_expire, https://dnf.readthedocs.io/en/latest/conf_ref.html
). Therefore, it's generally safe to omit update_cache: true when installing packages on RHEL 9 using Ansible.
In contrast, apt on Ubuntu does not have automatic cache expiration. Its metadata remains valid indefinitely unless explicitly updated. However, the Ansible apt module provides a cache_valid_time parameter (https://docs.ansible.com/ansible/latest/collections/ansible/builtin/apt_module.html#parameter-cache_valid_time), which allows the user to define how long the cache should be considered fresh. For example, setting cache_valid_time: 86400 ensures the cache is updated if it’s older than one day, offering a balance between performance and freshness.

Discussed this just now with @khaledk2. It boils down to:

In RockyLinux/RHEL case: -> do not use update_cache: true

In Ubuntu case: -> use update_cache:true and cache_valid_time: 86400.

We are going through the list of roles again, implementing ^^^

👍 this seems like a very reasonable path forward