canonical/microcloud

Cluster of 3 machines, only one can start VM

benoitjpnet opened this issue · 10 comments

I have just created a new cluster of 3 machines.
I created a VM on each LXD server.

But, only one is able to start the VM. The others will result in:

lxc start u2
Error: Failed setting up disk device "root": Failed to open "/etc/ceph/ceph.client.admin.keyring": open /etc/ceph/ceph.client.admin.keyring: no such file or directory

For some reason containers are fine, only VMs are affected. This is odd since containers also use Ceph.

root@mc10:~# find / -iname ceph.client.admin.keyring
/var/snap/microceph/707/conf/ceph.client.admin.keyring
root@mc10:~# 

It is indeed missing on the 2 other servers:

root@mc11:~# find / -iname ceph.client.admin.keyring
root@mc11:~# 

Sounds like a MicroCeph issue. @UtkarshBhatthere @sabaini looks like the conf directory is missing on some systems. Got any ideas here?

For starters, the error message says Failed to open /etc/ceph/... while it should be /var/snap/microceph/current/...

For starters, the error message says Failed to open /etc/ceph/... while it should be /var/snap/microceph/current/...

This is just a quirk of LXD, which symlinks /var/snap/microceph/current/conf into /etc/ceph to support ceph from both microceph and normal host install.

However it seems that on mc11, there is no keyring at all:

root@mc11:~# find / -iname ceph.client.admin.keyring
root@mc11:~# 

@benoitjpnet Could you please post the result of the following 2 commands on mc11:

# Checks to see if microceph and lxd have connected properly.
snap connections lxd

# Checks to see if the symlink has been properly set up inside the snap confinement for LXD.
snap run --shell lxd -c "aa-exec -p unconfined ls -l /etc/ceph"
root@mc11:~# snap connections lxd
Interface           Plug                Slot                 Notes
content[ceph-conf]  lxd:ceph-conf       microceph:ceph-conf  -
lxd                 microcloud:lxd      lxd:lxd              -
lxd-support         lxd:lxd-support     :lxd-support         -
network             lxd:network         :network             -
network-bind        lxd:network-bind    :network-bind        -
system-observe      lxd:system-observe  :system-observe      -
root@mc11:~# 
snap run --shell lxd -c "aa-exec -p unconfined ls -l /etc/ceph"
lrwxrwxrwx 1 root root 33 Nov 30 11:02 /etc/ceph -> /var/snap/microceph/current/conf/

Key is missing on mc11 and mc12. The key is present only on the node where I initialized the cluster, mc10. Note that I am able to reproduce the issue with a fresh install.

root@mc10:~# find / -iname ceph.client.admin.keyring
/var/snap/microceph/707/conf/ceph.client.admin.keyring
root@mc11:~# find / -iname ceph.client.admin.keyring
root@mc11:~# 
root@mc12:~# find / -iname ceph.client.admin.keyring
root@mc12:~# 

@UtkarshBhatthere happy if I assign this issue to you?

I have reproduced this, will check into it.

@sabaini @lmlg tagging you guys to keep in loop.

I this the issue is not assigned. May I know if this is still being tracked?