lvmteam/lvm2

pvs with error code -210

Closed this issue · 5 comments

Hello,

CentOS Linux release 8.3.2011
LVM version: 2.03.12(2)-git (2021-01-08)

We set up cluster lvm using sanlock as the lock type, and using serval vg in the cluster.
The global lock is on one vg, such its name is gzh_lvm_test_iscsi with lockspace name lvm_gzh_lvm_test_iscsi.

While the path accessing to the storage server is failed of the vg, 60s later I process the vg using the command as "lvmlockctl --kill/drop gzh_lvm_test_iscsi" to avoid the reset of the host.
But 1m later, sometime the pvs command return -210, and other lvm command aslo failed with same error code -210.
We make the path accessing storage server ok and vgchange --lockstart, the lvm commands become OK and error code is 0.
And some time the error code is -116 or -221, the lvm command will be ok after a while, about 1m.

The issue is when the accessing of the storage is still failed, we can not executed lvm command any more. How to avoid this issue?

Thanks.

The log is as below:

15752 1639558010 recv lvs[2504169] cl 2210 lock gl "" mode sh flags 0
15753 1639558010 lockspace "lvm_gzh_lvm_test_iscsi" not found for sanlock gl, searching...
15754 1639558010 work search for gl
15755 1639558010 worker found no gl_is_enabled
15756 1639558010 lockspace "lvm_gzh_lvm_test_iscsi" not found after search
15757 1639558010 send lvs[2504169] cl 2210 lock gl rv -210 ENOLS
15758 1639558010 close lvs[2504169] cl 2210 fd 8
15759 1639558032 new cl 2211 pi 2 fd 8
15760 1639558032 recv lvs[2504699] cl 2211 lock gl "" mode sh flags 0
15761 1639558032 lockspace "lvm_gzh_lvm_test_iscsi" not found for sanlock gl, searching...
15762 1639558032 work search for gl
15763 1639558032 worker found no gl_is_enabled
15764 1639558032 lockspace "lvm_gzh_lvm_test_iscsi" not found after search
15765 1639558032 send lvs[2504699] cl 2211 lock gl rv -210 ENOLS
15766 1639558032 close lvs[2504699] cl 2211 fd 8
15767 1639558035 new cl 2212 pi 2 fd 8
15768 1639558035 recv pvs[2504713] cl 2212 lock gl "" mode sh flags 0
15769 1639558035 lockspace "lvm_gzh_lvm_test_iscsi" not found for sanlock gl, searching...
15770 1639558035 work search for gl
15771 1639558035 worker found no gl_is_enabled
15772 1639558035 lockspace "lvm_gzh_lvm_test_iscsi" not found after search
15773 1639558035 send pvs[2504713] cl 2212 lock gl rv -210 ENOLS
15774 1639558035 close pvs[2504713] cl 2212 fd 8
15775 1639558148 new cl 2213 pi 2 fd 8
15776 1639558148 recv client[2507813] cl 2213 dump_log . "" mode iv flags 0

pvs -vvvvvvv cmd:
19:26:41.175335 pvs[680999] lvmcmdline.c:3005 Parsing: pvs -vvvvvvv
19:26:41.175356 pvs[680999] lvmcmdline.c:1992 Recognised command pvs_general (id 122 / enum 103).
19:26:41.175372 pvs[680999] filters/filter-sysfs.c:328 Sysfs filter initialised.
19:26:41.175380 pvs[680999] filters/filter-internal.c:79 Internal filter initialised.
19:26:41.175633 pvs[680999] filters/filter-regex.c:217 Regex filter initialised.
19:26:41.175639 pvs[680999] filters/filter-type.c:58 LVM type filter initialised.
19:26:41.175645 pvs[680999] filters/filter-usable.c:202 Usable device filter initialised (scan_lvs 0).
19:26:41.175652 pvs[680999] filters/filter-mpath.c:343 mpath filter initialised.
19:26:41.175658 pvs[680999] filters/filter-partitioned.c:71 Partitioned filter initialised.
19:26:41.175664 pvs[680999] filters/filter-signature.c:86 signature filter initialised.
19:26:41.175669 pvs[680999] filters/filter-md.c:150 MD filter initialised.
19:26:41.175675 pvs[680999] filters/filter-composite.c:100 Composite filter initialised.
19:26:41.175683 pvs[680999] filters/filter-persistent.c:190 Persistent filter initialised.
19:26:41.175691 pvs[680999] device_mapper/libdm-config.c:987 devices/hints not found in config: defaulting to all
19:26:41.175698 pvs[680999] device_mapper/libdm-config.c:1086 metadata/record_lvs_history not found in config: defaulting to 0
19:26:41.175704 pvs[680999] lvmcmdline.c:3062 DEGRADED MODE. Incomplete RAID LVs will be processed.
19:26:41.175713 pvs[680999] lvmcmdline.c:3068 Processing command: pvs -vvvvvvv
19:26:41.175719 pvs[680999] lvmcmdline.c:3069 Command pid: 680999
19:26:41.175724 pvs[680999] lvmcmdline.c:3070 System ID:
19:26:41.175732 pvs[680999] lvmcmdline.c:3073 O_DIRECT will be used
19:26:41.175738 pvs[680999] device_mapper/libdm-config.c:1014 global/locking_type not found in config: defaulting to 1
19:26:41.175745 pvs[680999] locking/locking.c:143 File locking settings: readonly:0 sysinit:0 ignorelockingfailure:0 global/metadata_read_only:0 global/wait_for_locks:1.
19:26:41.175761 pvs[680999] device_mapper/libdm-config.c:987 devices/md_component_checks not found in config: defaulting to auto
19:26:41.175767 pvs[680999] lvmcmdline.c:2913 Using md_component_checks auto use_full_md_check 0
19:26:41.175776 pvs[680999] daemon-client.c:31 /run/lvm/lvmlockd.socket: Opening daemon socket to lvmlockd for protocol lvmlockd version 1.
19:26:41.175795 pvs[680999] daemon-client.c:50 Sending daemon lvmlockd: hello
19:26:41.175865 pvs[680999] locking/lvmlockd.c:92 Successfully connected to lvmlockd on fd 3.
19:26:41.175892 pvs[680999] device_mapper/libdm-config.c:987 report/output_format not found in config: defaulting to basic
19:26:41.175902 pvs[680999] device_mapper/libdm-config.c:1086 log/report_command_log not found in config: defaulting to 0
19:26:41.175915 pvs[680999] device_mapper/libdm-config.c:1086 report/aligned not found in config: defaulting to 1
19:26:41.175922 pvs[680999] device_mapper/libdm-config.c:1086 report/buffered not found in config: defaulting to 1
19:26:41.175930 pvs[680999] device_mapper/libdm-config.c:1086 report/headings not found in config: defaulting to 1
19:26:41.175935 pvs[680999] device_mapper/libdm-config.c:987 report/separator not found in config: defaulting to
19:26:41.175943 pvs[680999] device_mapper/libdm-config.c:1086 report/prefixes not found in config: defaulting to 0
19:26:41.175949 pvs[680999] device_mapper/libdm-config.c:1086 report/quoted not found in config: defaulting to 1
19:26:41.175958 pvs[680999] device_mapper/libdm-config.c:1086 report/columns_as_rows not found in config: defaulting to 0
19:26:41.175966 pvs[680999] device_mapper/libdm-config.c:987 report/pvs_sort not found in config: defaulting to pv_name
19:26:41.175974 pvs[680999] device_mapper/libdm-config.c:987 report/pvs_cols_verbose not found in config: defaulting to pv_name,vg_name,pv_fmt,pv_attr,pv_size,pv_free,dev_size,pv_uuid
19:26:41.175981 pvs[680999] device_mapper/libdm-config.c:987 report/compact_output_cols not found in config: defaulting to
19:26:41.176051 pvs[680999] toollib.c:4391 Processing each PV
19:26:41.176059 pvs[680999] misc/lvm-flock.c:230 Locking /run/lock/lvm/P_global RB
19:26:41.176071 pvs[680999] misc/lvm-flock.c:114 _do_flock /run/lock/lvm/P_global:aux WB
19:26:41.176088 pvs[680999] misc/lvm-flock.c:47 _undo_flock /run/lock/lvm/P_global:aux
19:26:41.176098 pvs[680999] misc/lvm-flock.c:114 _do_flock /run/lock/lvm/P_global RB
19:26:41.176110 pvs[680999] locking/lvmlockd.c:1594 lockd global mode sh
19:26:41.177056 pvs[680999] locking/lvmlockd.c:177 lockd_result -210 flags none lm none
19:26:41.177063 pvs[680999] locking/lvmlockd.c:303 lvmlockd lock_gl sh result -210 0
19:26:41.177072 pvs[680999] locking/lvmlockd.c:1710 Global lock failed: error -210
19:26:41.177078 pvs[680999] misc/lvm-flock.c:84 Unlocking /run/lock/lvm/P_global
19:26:41.177083 pvs[680999] misc/lvm-flock.c:47 _undo_flock /run/lock/lvm/P_global
19:26:41.177095 pvs[680999] toollib.c:4440
19:26:41.177105 pvs[680999] device_mapper/libdm-config.c:1086 report/compact_output not found in config: defaulting to 0
19:26:41.177114 pvs[680999] daemon-client.c:177 Closing daemon socket (fd 3).
19:26:41.177127 pvs[680999] cache/lvmcache.c:2056 Destroy lvmcache content
19:26:41.177140 pvs[680999] lvmcmdline.c:3174 Completed: pvs -vvvvvvv

diff --git a/daemons/lvmlockd/lvmlockd-sanlock.c b/daemons/lvmlockd/lvmlockd-sanlock.c
index e595eeffd..235059e5f 100644
--- a/daemons/lvmlockd/lvmlockd-sanlock.c
+++ b/daemons/lvmlockd/lvmlockd-sanlock.c
@@ -1578,7 +1578,7 @@ int lm_rem_lockspace_sanlock(struct lockspace *ls, int free_vg)
        rv = sanlock_rem_lockspace(&lms->ss, 0);
        if (rv < 0) {
                log_error("S %s rem_lockspace_san error %d", ls->name, rv);
-               return rv;
+               /*return rv; */
        }

        if (free_vg) {

The patch above can fix the bug. The reason is as the vg is only one gl, but the io failed to storage still, other vgs has not any gl. As the lockspace thread is exit out while the vg is dropped, if sanlock remove failed with error, we also set the gl_lsname_sanlock with 0.

Can someone reviews the patch, thanks.

Can someone reviews the patch, thanks.

Most folks are off for the holidays, this will be looked at when they're back. Thanks!

Hi, thanks for debugging that, I've pushed the fix to the main branch.

OK, Thanks, I closed the issue.