Can't deploy version 15.2.12 on yocto OS - rocksdb: NotFound: db/ - _read_fsid unparsable uuid
insatomcat opened this issue · 5 comments
What happened:
I'm trying to use ceph-ansible (stable-5.0) to deploy a ceph cluster on server using a yocto OS (ceph version 15.2.12, the packaged "honister" version: https://layers.openembedded.org/layerindex/recipe/192188/)
The OS already contains the ceph binary.
The playbooks works fine up the the OSD creation part where it fails with an error I could not find much about:
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 0886ca96-9af5-4381-8f17-7924b1ccf5fd --setuser ceph --setgroup ceph
stderr: 2022-03-01T08:03:56.942+0000 740d98abfd00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: 2022-03-01T08:03:56.984+0000 740d98abfd00 -1 rocksdb: NotFound: db/: No such file or directory
stderr: 2022-03-01T08:03:56.984+0000 740d98abfd00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _open_db erroring opening db:
stderr: 2022-03-01T08:03:57.456+0000 740d98abfd00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) mkfs failed, (5) Input/output error
stderr: 2022-03-01T08:03:57.456+0000 740d98abfd00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
stderr: 2022-03-01T08:03:57.457+0000 740d98abfd00 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-2/: (5) Input/output error[0m
--> Was unable to complete a new OSD, will rollback changes
What you expected to happen:
With a precedent yocto version (yocto dunfell, ceph 15.2.0), I did not have this problem.
This might be linked to ceph but it can also be a problem with the yocto integration, I'm trying to understand what is happening so that I can find the root cause (maybe a link with rocksdb?)
How to reproduce it (minimal and precise):
- create a yocto image with honister including ceph (default version will be 15.2.12)
- deploy a ceph cluster (mon, osd, mgr on all nodes, 3 in my setup) with ceph-ansible (branch stable-5.0)
Share your group_vars files, inventory and full ceph-ansibe log
Environment:
- OS (e.g. from /etc/os-release): Yocto honister
- Kernel (e.g.
uname -a
): 5.15.14-rt27-mainline-rt SMP PREEMPT_RT - Docker version if applicable (e.g.
docker version
): N/A - Ansible version (e.g.
ansible-playbook --version
): ansible-playbook 2.9.6 - ceph-ansible version (e.g.
git head or tag or stable branch
): stable-5.0 - Ceph version (e.g.
ceph -v
): 15.5.12 (ceph version 128-NOTFOUND (8f69994803975eda09ba6fbec77701982c33af34) octopus (rc))
ansible.log
ceph_group_vars.tar.gz
ceph-ansible-site.yaml.gz
Thanks in advance !
At first glance, (5) Input/output error
usually means the device is faulty, could you check that?
This is a fully virtual environment, ceph being given "/dev/vdb" to create the osd, and those disk being brand new qcow2 files (created with qemu-img create -f qcow2 vm1-osd.qcow2 30G).
I think we can rule out the hardward problem...
Thanks.
can you show the output of ls -l /var/lib/ceph/osd/ceph-2/
?
+do you have the full ceph-volume.log ?
I tried to create a brand new yocto qcow2 image (to be sure there is no curruption) and then after the playbook fails, tried running the command manually on one node, this is the result:
root@hypervisor1-aure:~# vgremove ceph-49e1dc09-cc76-4b74-b03f-7074e56d55ae
Do you really want to remove volume group "ceph-49e1dc09-cc76-4b74-b03f-7074e56d55ae" containing 1 logical volumes? [y/n]: y
Do you really want to remove active logical volume ceph-49e1dc09-cc76-4b74-b03f-7074e56d55ae/osd-block-5a6fcb4c-9416-4968-9986-de52c531b3b1? [y/n]: y
Logical volume "osd-block-5a6fcb4c-9416-4968-9986-de52c531b3b1" successfully removed
Volume group "ceph-49e1dc09-cc76-4b74-b03f-7074e56d55ae" successfully removed
root@hypervisor1-aure:~# ceph-volume --cluster ceph lvm batch --bluestore --yes /dev/vdb
--> DEPRECATION NOTICE
--> You are using the legacy automatic disk sorting behavior
--> The Pacific release will change the default to --no-auto
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 1.0
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c38afdbd-8bc4-48a2-86c4-531103a9565c
Running command: /usr/sbin/vgcreate --force --yes ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0 /dev/vdb
stdout: Volume group "ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0" successfully created
Running command: /usr/sbin/lvcreate --yes -l 7679 -n osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0
stdout: Wiping ceph_bluestore signature on /dev/ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0/osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c.
stdout: Logical volume "osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c" created.
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
--> Executable selinuxenabled not in PATH: /usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
Running command: /bin/chown -h ceph:ceph /dev/ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0/osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0/osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
stderr: got monmap epoch 1
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQAn1h9iME4+IRAAVo0ARiLaZntkfoTSwXEZiA==
stdout: creating /var/lib/ceph/osd/ceph-0/keyring
added entity osd.0 auth(key=AQAn1h9iME4+IRAAVo0ARiLaZntkfoTSwXEZiA==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid c38afdbd-8bc4-48a2-86c4-531103a9565c --setuser ceph --setgroup ceph
stderr: 2022-03-02T20:40:09.202+0000 72602867ed00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
stderr: 2022-03-02T20:40:09.266+0000 72602867ed00 -1 rocksdb: NotFound: db/: No such file or directory
stderr: 2022-03-02T20:40:09.266+0000 72602867ed00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _open_db erroring opening db:
stderr: 2022-03-02T20:40:09.716+0000 72602867ed00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) mkfs failed, (5) Input/output error
stderr: 2022-03-02T20:40:09.716+0000 72602867ed00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
stderr: 2022-03-02T20:40:09.716+0000 72602867ed00 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0/: (5) Input/output error
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
stderr: purged osd.0
--> RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid c38afdbd-8bc4-48a2-86c4-531103a9565c --setuser ceph --setgroup ceph
root@hypervisor1-aure:~# ls -l /var/lib/ceph/osd/ceph-0/
total 12
-rw-r--r-- 1 ceph ceph 514 Mar 2 20:40 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Mar 2 20:40 block -> /dev/ceph-e4bb2b09-a0d4-4c86-a999-2b0cff143ea0/osd-block-c38afdbd-8bc4-48a2-86c4-531103a9565c
-rw-r--r-- 1 ceph ceph 0 Mar 2 20:40 fsid
-rw------- 1 ceph ceph 56 Mar 2 20:40 keyring
-rw------- 1 ceph ceph 10 Mar 2 20:40 type
This is the ceph-volume.log:
Hope this helps...
Thanks
ok my problem was this bug: https://tracker.ceph.com/issues/49815
It occurs when using a rocksdb version >= 6.15 (Yocto Honister is using v6.20, Yocto Dunfell is using v6.6).
The fix has been backported to octopus from v15.2.14 only (https://tracker.ceph.com/issues/49981), hence my problem with ceph v15.2.12 + rocksdb 6.20.
Thanks.