Calculated value for `osd target memory` too high for deployments with multiple OSDs per device
janhorstmann opened this issue · 3 comments
Bug Report
What happened:
osd target memory
was set to a much higher value after upgrading to pacific
, resulting in recurring out of memory kills of OSDs.
Cause:
Commit 225ae38ee2f74165e7d265817597fe451df3e919 changed the calculation of num_osds
, which is used to calculate a sensible value for osd memory target
. The new formula uses ansible's difference
filter, which according to the docs returns a list with unique elements.
Thus on deployments with multiple OSDs per device, where the same device should be counted multiple times, the value for num_osds
is too small and there is an overestimation of the available memory per OSD.
Apart from that DB devices are now also counted into num_osds
Workarounds:
Set a fixed value for osd memory target
in ceph_conf_overrides
.
Environment:
- Ansible version (e.g.
ansible-playbook --version
):ansible-playbook 2.10.17 config file = None configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible executable location = /usr/local/bin/ansible-playbook python version = 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
- ceph-ansible version (e.g.
git head or tag or stable branch
):stable-6.0
(same calculation instable-7.0
andmain
, but unverified) - Ceph version (e.g.
ceph -v
):ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)