ceph/ceph-ansible

Mirroring feature issues unknown state of image on receiver

badfiles opened this issue · 5 comments

Bug Report

rbd-mirror feature is not working as expected, on replica I get:

vars for replica:

ceph_rbd_mirror_configure: yes
ceph_rbd_mirror_pool: libvirt
ceph_rbd_mirror_mode: image

ceph_rbd_mirror_remote_cluster: hv56
ceph_rbd_mirror_remote_user: client.rbd-mirror-peer
ceph_rbd_mirror_remote_mon_hosts: 192.168.*.115,192.168.*.116
ceph_rbd_mirror_remote_key: ######
# rbd mirror pool status libvirt --verbose
health: WARNING                 <--- WHY?
daemon health: OK
image health: WARNING      <--- WHY?
images: 1 total
    1 unknown                         <--- WHY?

DAEMONS
service 2088106:
  instance_id: 2089282
  client_id: hv3
  hostname: hv3
  version: 18.2.2
  leader: true
  health: OK


IMAGES
rev:
  global_id:   cbc64dc8-8de3-42ca-a3e3-64f252cf8b28
  state:       up+replaying
  description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,"entries_per_second":0.0,"non_primary_position":{"entry_tid":3,"object_number":3,"tag_tid":1},"primary_position":{"entry_tid":3,"object_number":3,"tag_tid":1}}
  service:     hv3 on hv3
  last_update: 2024-04-24 08:13:52

on source I get

vars for source:

ceph_rbd_mirror_configure: yes
ceph_rbd_mirror_pool: libvirt
ceph_rbd_mirror_mode: image

ceph_rbd_mirror_remote_cluster: hv34
ceph_rbd_mirror_remote_user: client.rbd-mirror-peer
ceph_rbd_mirror_remote_mon_hosts: 192.168.*.113,192.168.*.114
ceph_rbd_mirror_remote_key: ******
# rbd mirror image enable libvirt/rev
Mirroring enabled

# rbd --cluster ceph mirror pool status libvirt --verbose
health: WARNING                 <--- WHY?
daemon health: WARNING   <--- WHY?
image health: OK
images: 1 total
    1 replaying

DAEMONS
service 52551:
  instance_id: 52557
  client_id: hv5
  hostname: hv5
  version: 18.2.2
  leader: false
  health: OK                        <--- SIC


IMAGES
rev:
  global_id:   cbc64dc8-8de3-42ca-a3e3-64f252cf8b28

Although mirroring seem to work, I have no idea how to remove warnings;

Trying to fix this I found a small error, according to ceph docs

client.rbd-mirror-peer should have caps:

$ ceph auth get-or-create client.rbd-mirror-peer mon 'profile rbd-mirror-peer' osd 'profile rbd'

so I fixed the role like this

--- a/roles/ceph-rbd-mirror/tasks/configure_mirroring.yml
+++ b/roles/ceph-rbd-mirror/tasks/configure_mirroring.yml
@@ -41,7 +41,7 @@
         user: client.admin
         user_key: "/etc/ceph/{{ cluster }}.client.admin.keyring"
         caps:
-          mon: "profile rbd-mirror"
+          mon: "profile {{ (item.name == ceph_rbd_mirror_local_user) | ternary('rbd-mirror-peer', 'rbd-mirror') }}"

but it did not help, I also tried giving full permission (allow *) to all involved users on all spaces,
no change to the status (

What you expected to happen:

No unknown/warning states

after I mark an image on receiver as primary, it starts self-syncing, not with the remote (I don't see it on remote at all)
and yet it's status still unknown

i get

# rbd mirror image enable libvirt/test
Mirroring enabled

# rbd mirror pool status libvirt --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 2 total
    2 unknown                              <---

DAEMONS
service 2088106:
  instance_id: 2089351
  client_id: hv3
  hostname: hv3
  version: 18.2.2
  leader: true
  health: OK


IMAGES
rev:
  global_id:   cbc64dc8-8de3-42ca-a3e3-64f252cf8b28
  state:       up+replaying
  description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,"entries_per_second":0.0,"non_primary_position":{"entry_tid":3,"object_number":3,"tag_tid":1},"primary_position":{"entry_tid":3,"object_number":3,"tag_tid":1}}
  last_update: 2024-04-24 11:08:52

test:
  global_id:   6e824979-6367-41e9-a43a-59bf7f0e7fc3
  state:       up+stopped
  description: local image is primary
  service:     hv3 on hv3
  last_update: 2024-04-24 11:09:06

the problem was that one of peering sites was configured as tx-only, while another one was in rx-tx mode

I have no idea how it happened and where in the ceph-ansible code this peer key (direction) is being set

# rbd mirror pool info libvirt --all
Mode: image
Site Name: hv56

Peer Sites: 

UUID: c806f310-5102-45d3-8463-f6a04dba28f4
Name: hv34
Mirror UUID: 8dda2c1f-9a36-43ad-8e12-e0d7f28d016e
Direction: tx-only
Client: client.rbd-mirror-peer
Mon Host: 192.168.*.114,192.168.*.115
...

fixed manually:

# rbd mirror pool peer set libvirt c806f310-5102-45d3-8463-f6a04dba28f4 direction rx-tx

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

any updates on tx-only/tx-rx issue
?

what about this bug:

Although mirroring seem to work, I have no idea how to remove warnings;

Trying to fix this I found a small error, according to ceph docs

client.rbd-mirror-peer should have caps:

$ ceph auth get-or-create client.rbd-mirror-peer mon 'profile rbd-mirror-peer' osd 'profile rbd'