hpe-storage/co-deployments

Auto failover, when one 3par is down

krakazyabra opened this issue · 6 comments

Hello. I've deployed 3par-primera-csp for 1.18 .

I've faced with problem: when one of two 3pars device is down (no iscsi link), the volume stops to export on host.

My config was described here

I have pvc in k8s, on 3par side there are 2 volumes (one original and second replicated). This volume exports to node, I can see it in multipath. But when I'm shutting down one 3par, I'm waiting, that the volume still will be exporting through remaining available ports.

@wdurairaj can you take a look at this or involve someone who can?

this is documented as a known limitation and there is a workaround around how to make the POD to come in running state in this page -- https://github.com/hpe-storage/csi-driver/blob/master/release-notes/v1.3.0.md

this is the workaround suggested there

  • It is recommended to edit backend in primary secret (using kubectl apply -f secret) to make it point to secondary array ip

Hello @wdurairaj
I was talking not about PVC, but about more lower level - iscsi device on node. if one array is down, on node there should be some switching to active ghost running device.
This is my multipath -ll

root@m5c25:/home/ep192# multipath -ll -v2
mpathq (360002ac0000000000000003700019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:0 sda 8:0  active ready running
| `- 3:0:0:0 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 4:0:0:0 sdc 8:32 active ghost running

And I expect, when I disconnect sda and sdb, sdc device will become primary.
But I get

root@m5c25:/home/ep192# multipath -ll -v2
mpathq (360002ac0000000000000003700019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- 2:0:0:0 sda 8:0  failed faulty running
| `- 3:0:0:0 sdb 8:16 failed faulty running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 4:0:0:0 sdc 8:32 failed ghost running

My miltipath config is

root@m5c25:/home/ep192# cat /etc/multipath.conf 
defaults {
    user_friendly_names yes
    find_multipaths     no
    uxsock_timeout      10000
}
blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    device {
        vendor  ".*"
        product ".*"
    }
}
blacklist_exceptions {
    property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL)"
    device {
        vendor  "Nimble"
        product "Server"
    }
    device {
        product "VV"
        vendor  "3PARdata"
    }
    device {
        vendor  "TrueNAS"
        product "iSCSI Disk"
    }
}
devices {
    device {
        vendor               "Nimble"
        rr_weight            uniform
        rr_min_io            100
        hardware_handler     "1 alua"
        rr_min_io_rq         1
        prio                 alua
        dev_loss_tmo         infinity
        fast_io_fail_tmo     5
        no_path_retry        18
        failback             immediate
        path_selector        "round-robin 0"
        product              "Server"
        path_checker         tur
        path_grouping_policy group_by_prio
        features             0
    }
    device {
        uid_attribute        ID_SERIAL
        vendor               "TrueNAS"
        product              "iSCSI Disk"
        path_grouping_policy group_by_prio
        path_selector        "queue-length 0"
        hardware_handler     "1 alua"
        rr_weight            priorities
    }
}

I could manually make failover.
from 3par02 there were 2 connections, from 3par01 - one
multipath showed me

mpathb (360002ac0000000000000005300019d4a) dm-1 ##,##
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
mpatha (360002ac0000000000000005c00019d4a) dm-0 3PARdata,VV
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 3:0:0:0 sdb 8:16 active ready running
| `- 2:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 4:0:0:0 sdc 8:32 active ghost running

(sda and sdb - from 3par02, sdc from 3par01)

I shutted down all ports on 3par02 (remote copy + iscsi). I expected, that device sdc will automatically become active ready running. But it get another status: failed ghost running.
Then in SSMC I went to Remote Copy Groups, and click on Failover button. In the same second, volume from 3par01 became active and I could access it from pod. All ports from 3par02 were down.

So, in manual mode, I could solve the task. But in my opinion, such failovering should do csi-driver

  • Failover is not done by the CSI driver, but this driver provides a way to replicate the data between arrays
  • Automatic failover is ideally done by another product called HPE Quorum Witness which monitores the health of the primary/secondary array and initiates failover. There is a community blog around this peer persistence replication that we do in the CSI driver which talks in detail about this. You can read in some Youtube videos as well. This is the preferred mechanism for doing failover. Whatever is mentioned in the previous steps is a manual failover procedure using tools like SSMC/CLI.

Hello, @wdurairaj
I checked everything and fount, that Auto failover was not enabled for RMC. Now it is enable and I made some tests:
Disabled secondary array first (physically, disconnected iSCSI, RC and mgmt cable in 10 seconds), waited for 1-2 mins and disconnected master array. I simulated outage.
Then I connected master back (second array was still disconnected), but on the node my volume became ro (read-only). I couldn't find the way, how to make it rw again. Probably that should be done by hpe-csi-node daemonset.

My second question is: if I manually logout from the iscsi session iscsiadm --mode node --targetname <iqn> --portal <ip:port> --logout and delete the session iscsiadm -m node -T <iqn> -p <ip:port> -u, how is it possible to restore connection back (from 3par volume is exporting)

Thanks in advance.