Cannot delete instances - image has watchers - not removing
Opened this issue · 3 comments
Hello
When trying to delete instances on Microcloud the instances fail to delete with the following error:
Error: Failed deleting instance "private-repo-lds-3" in project "REDACTED_PROJECT_NAME": Error deleting storage volume: Failed to delete volume: Failed to run: rbd --id admin --cluster ceph --pool lxd_remote rm virtual-machine_REDACTED_PROJECT_NAME_private-repo-lds-3.block: exit status 16 (2024-04-29T11:02:41.760+0000 7fe2f4898640 -1 librbd::image::PreRemoveRequest: 0x5563e888b7b0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.)
The issue was produced by deploying a set of 14 VMs, with the following config: https://pastebin.canonical.com/p/DmfDtKc6cz/
The VMs were deployed on Friday, and left over the weekend. When destroying the VMs then failed with the above error
Workaround
sudo ps aux | grep qemu
- Identify the process for your VMs
sudo kill ${PID}
Peter
VMs were found to have crashed, killing qemu processes released the rbd volumes to allow deletion.
The steps to reproduce this:
- Create VMs as described
- Add a new network VLAN to bond on which LXD is running its services via netplan, e.g. vlan with ID 55
- Apply VLAN changes
- Allow cluster to settle
- Attempt removal
These steps are what I can gather has happened since I used the environment. I need to validate this and confirm minimal reproducer
Thank you
Peter
The steps to reproduce this:
- Create VMs as described
- Add a new network VLAN to bond on which LXD is running its services via netplan, e.g. vlan with ID 55
- Apply VLAN changes
- Allow cluster to settle
- Attempt removal
These steps are what I can gather has happened since I used the environment. I need to validate this and confirm minimal reproducer
Thank you Peter
Following up here - I am struggling to validate the above steps as a reproducer. I tried adding a VLAN, and I do see errors from ceph, but still VM deletion is possible.