Auto failover, when one 3par is down
krakazyabra opened this issue · 6 comments
Hello. I've deployed 3par-primera-csp for 1.18 .
I've faced with problem: when one of two 3pars device is down (no iscsi link), the volume stops to export on host.
My config was described here
I have pvc in k8s, on 3par side there are 2 volumes (one original and second replicated). This volume exports to node, I can see it in multipath. But when I'm shutting down one 3par, I'm waiting, that the volume still will be exporting through remaining available ports.
@wdurairaj can you take a look at this or involve someone who can?
this is documented as a known limitation and there is a workaround around how to make the POD to come in running state in this page -- https://github.com/hpe-storage/csi-driver/blob/master/release-notes/v1.3.0.md
this is the workaround suggested there
- It is recommended to edit backend in primary secret (using kubectl apply -f secret) to make it point to secondary array ip
Hello @wdurairaj
I was talking not about PVC, but about more lower level - iscsi device on node. if one array is down, on node there should be some switching to active ghost running
device.
This is my multipath -ll
root@m5c25:/home/ep192# multipath -ll -v2
mpathq (360002ac0000000000000003700019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:0 sda 8:0 active ready running
| `- 3:0:0:0 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 4:0:0:0 sdc 8:32 active ghost running
And I expect, when I disconnect sda and sdb, sdc device will become primary.
But I get
root@m5c25:/home/ep192# multipath -ll -v2
mpathq (360002ac0000000000000003700019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- 2:0:0:0 sda 8:0 failed faulty running
| `- 3:0:0:0 sdb 8:16 failed faulty running
`-+- policy='service-time 0' prio=1 status=enabled
`- 4:0:0:0 sdc 8:32 failed ghost running
My miltipath config is
root@m5c25:/home/ep192# cat /etc/multipath.conf
defaults {
user_friendly_names yes
find_multipaths no
uxsock_timeout 10000
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
device {
vendor ".*"
product ".*"
}
}
blacklist_exceptions {
property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL)"
device {
vendor "Nimble"
product "Server"
}
device {
product "VV"
vendor "3PARdata"
}
device {
vendor "TrueNAS"
product "iSCSI Disk"
}
}
devices {
device {
vendor "Nimble"
rr_weight uniform
rr_min_io 100
hardware_handler "1 alua"
rr_min_io_rq 1
prio alua
dev_loss_tmo infinity
fast_io_fail_tmo 5
no_path_retry 18
failback immediate
path_selector "round-robin 0"
product "Server"
path_checker tur
path_grouping_policy group_by_prio
features 0
}
device {
uid_attribute ID_SERIAL
vendor "TrueNAS"
product "iSCSI Disk"
path_grouping_policy group_by_prio
path_selector "queue-length 0"
hardware_handler "1 alua"
rr_weight priorities
}
}
I could manually make failover.
from 3par02 there were 2 connections, from 3par01 - one
multipath showed me
mpathb (360002ac0000000000000005300019d4a) dm-1 ##,##
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
mpatha (360002ac0000000000000005c00019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 3:0:0:0 sdb 8:16 active ready running
| `- 2:0:0:0 sda 8:0 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 4:0:0:0 sdc 8:32 active ghost running
(sda and sdb - from 3par02, sdc from 3par01)
I shutted down all ports on 3par02 (remote copy + iscsi). I expected, that device sdc will automatically become active ready running
. But it get another status: failed ghost running
.
Then in SSMC I went to Remote Copy Groups, and click on Failover button. In the same second, volume from 3par01 became active and I could access it from pod. All ports from 3par02 were down.
So, in manual mode, I could solve the task. But in my opinion, such failovering should do csi-driver
- Failover is not done by the CSI driver, but this driver provides a way to replicate the data between arrays
- Automatic failover is ideally done by another product called HPE Quorum Witness which monitores the health of the primary/secondary array and initiates failover. There is a community blog around this peer persistence replication that we do in the CSI driver which talks in detail about this. You can read in some Youtube videos as well. This is the preferred mechanism for doing failover. Whatever is mentioned in the previous steps is a manual failover procedure using tools like SSMC/CLI.
Hello, @wdurairaj
I checked everything and fount, that Auto failover
was not enabled for RMC. Now it is enable and I made some tests:
Disabled secondary array first (physically, disconnected iSCSI, RC and mgmt cable in 10 seconds), waited for 1-2 mins and disconnected master array. I simulated outage.
Then I connected master back (second array was still disconnected), but on the node my volume became ro
(read-only). I couldn't find the way, how to make it rw
again. Probably that should be done by hpe-csi-node
daemonset.
My second question is: if I manually logout from the iscsi session iscsiadm --mode node --targetname <iqn> --portal <ip:port> --logout
and delete the session iscsiadm -m node -T <iqn> -p <ip:port> -u
, how is it possible to restore connection back (from 3par volume is exporting)
Thanks in advance.