ceph/ceph-ansible

Calculated value for `osd target memory` too high for deployments with multiple OSDs per device

janhorstmann opened this issue · 3 comments

Bug Report

What happened:
osd target memory was set to a much higher value after upgrading to pacific, resulting in recurring out of memory kills of OSDs.

Cause:
Commit 225ae38ee2f74165e7d265817597fe451df3e919 changed the calculation of num_osds, which is used to calculate a sensible value for osd memory target. The new formula uses ansible's difference filter, which according to the docs returns a list with unique elements.
Thus on deployments with multiple OSDs per device, where the same device should be counted multiple times, the value for num_osds is too small and there is an overestimation of the available memory per OSD.

Apart from that DB devices are now also counted into num_osds

Workarounds:
Set a fixed value for osd memory target in ceph_conf_overrides.

Environment:

  • Ansible version (e.g. ansible-playbook --version):
    ansible-playbook 2.10.17
    config file = None
    configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
    ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
    executable location = /usr/local/bin/ansible-playbook
    python version = 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
    
  • ceph-ansible version (e.g. git head or tag or stable branch): stable-6.0 (same calculation in stable-7.0 and main, but unverified)
  • Ceph version (e.g. ceph -v): ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)