vgchange -ay <vg> doesn't activate VG
Closed this issue · 6 comments
Hello!
I caught such a problem, after importing pool, I run vgchange --ay, it can't activate volumes with error
2024-08-01 23:14:03.489 vgchange -ay lpol
2024-08-01 23:14:05.111 0 logical volume(s) in volume group "lpol" now active /dev/mapper/lpol-thin_vg_tmeta: open failed: No such file or directory /dev/mapper/lpol-thin_vg_tmeta: open failed: No such file or directory
hi - from log it looks like there is some failure while activating _tmeta device and thin_check cannot successfully run.
Howerver there is no 'dmesg' log from the same moment - so it cannot be see what could be the reason of failure.
Possibly also thin_check may fail on it's own.
Try to activate _tmeta as 'component' activation and run thin_check and see the output.
Hello @zkabelac ! I've been working on the same issue.
Here's a log of lvchange -ay r5/thin_vg -vvvv
with dmesg -w
running in the background. There's also some additional infomation like lvs
and vgs
which might be helpful.
lvchange_thin_pool_with_dmesg.log
Some additional info about the issue:
- It is more likely to be reproduced either on the server under i\o load or on the low-powered VMs (with like 1-2 cores of the laptop processor). Actually on the VMs the issue can be consistently reproduced;
- Sometimes the system can recover from
_tmeta: open failed: No such file or directory
after a couple of seconds and the thin pool can be activated, but at least twice we encountered that LVM can't open_tmeta
because it's absent even after a couple of hours.
Also I activated _tmeta
and run thin_check
like you advised, hopefully it clarifies something:
[Tue Aug 06 17:38:37 @ ~]:> lvchange -ay /dev/r5/thin_vg_tmeta
Do you want to activate component LV in read-only mode? [y/n]: y
Allowing activation of component LV.
[15142.485320] md/raid:mdX: device dm-1 operational as raid disk 0
[15142.485327] md/raid:mdX: device dm-3 operational as raid disk 1
[15142.485329] md/raid:mdX: device dm-5 operational as raid disk 2
[15142.485331] md/raid:mdX: device dm-7 operational as raid disk 3
[15142.485332] md/raid:mdX: device dm-9 operational as raid disk 4
[15142.485333] md/raid:mdX: device dm-11 operational as raid disk 5
[15142.485335] md/raid:mdX: device dm-13 operational as raid disk 6
[15142.485336] md/raid:mdX: device dm-15 operational as raid disk 7
[15142.485338] md/raid:mdX: device dm-17 operational as raid disk 8
[15142.485339] md/raid:mdX: device dm-19 operational as raid disk 9
[15142.493468] md/raid:mdX: raid level 5 active with 10 out of 10 devices, algorithm 2
[Tue Aug 06 17:42:11 @ ~]:> thin_check /dev/r5/thin_vg_tmeta
examining superblock
TRANSACTION_ID=2
METADATA_FREE_BLOCKS=3935231
examining devices tree
examining mapping tree
checking space map counts
[Tue Aug 06 17:42:28 @ ~]:>
Just like I said, I can reproduce the issue and gather additional info if necessary.
Since you are active in other issue - I'm getting some feeling your udev system configuration or lvm2 build is possibly invalid.
The logged error suggests that /dev/mapper/r5-thin_vg_tmeta symlink is missing in the moment thin_check is supposed to check this device.
So let's just recheck you are building your lvm2 with 'configure --enable-udev_sync' option (which is mandatory for properly working udev synchronization.
Also make sure you udev rules.d directory contains properly installed udev rules from lvm2 project.
In case you run your tools with badly working udev - feel free to use 'verify_udev_operations=1' - you likely cannot mess your system with this setting any more....
Our build configuration includes '--enable-udev_sync':
[Wed Aug 07 14:52:14 @ ~]:> lvs --version
LVM version: 2.03.11(2) (2021-01-08)
Library version: 1.02.175 (2021-01-08)
Driver version: 4.47.0
Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline --with-vdo=internal --with-writecache=internal
Anyways, looks like verify_udev_operations=1
helps. However, looking at the lvmconfig comment this option appears to be for the debugging purposes. Is it ok to use this option in production, at least as a workaround? Are there any drawbacks, something to be aware of?
Anyways, looks like
verify_udev_operations=1
helps. However, looking at the lvmconfig comment this option appears to be for the debugging purposes. Is it ok to use this option in production, at least as a workaround? Are there any drawbacks, something to be aware of?
Well you should figure out why your udev is malfunctioning - your system is not working correctly and lvm2 cannot correctly synchronize with udev (in this case it cannot wait for udev to create symlink to a device that is used for accessing _tmeta content).
I'd say that using production system with misbehaving udev would be seen as a major bug, but what do I know....
Keeping this workaround enabled basically means lvm2 will interfere with running udev (if there is one) - and it will also slightly slow down command execution due to the symlink handling and validation - but that's rather a minor issue compared the the one mentioned above....
It's debug feature, because in general it's able to 'very well mask' the misconfigured udev - and that's a bad idea overall - as tools in such system will see a different set of devices....
Assuming issue was related to some udev problems (possibly udev was even not running ??)
Closing issue since it appears there is no issue within lvm2.