LINBIT/drbd

DRBD 9.1.12 drbdsetup completely frozen

phoenix-bjoern opened this issue · 0 comments

One resource turned into a bad situation during the night. After trying to remove (linstor r d) the resource on a node with an outdated state it is not possible to check status with drbdadm or drbdsetup on the primary node (satellite). While drbdadm times out, drbdsetup just freezes.

root@de-fra-node11:/# drbdadm status pvc-976cacaa-af84-4398-814d-d4745288e81a
Command 'drbdsetup status pvc-976cacaa-af84-4398-814d-d4745288e81a' did not terminate within 5 seconds
root@de-fra-node11:/# drbdsetup show pvc-976cacaa-af84-4398-814d-d4745288e81a

All other resources on the node work just fine and drbdadm + drbdsetup work as expected.

This is how the resources looks like in Linstor:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ de-fra-node11 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ lvm-thin             ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊  1.51 GiB ┊ InUse ┊ UpToDate ┊
┊ de-fra-node55 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ DfltDisklessStorPool ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊           ┊       ┊  Unknown ┊
┊ de-fra-node56 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ lvm-thin             ┊     0 ┊    1008 ┊ None          ┊           ┊       ┊    Error ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'de-fra-node56', resource: 'pvc-976cacaa-af84-4398-814d-d4745288e81a', volume: 0 - The device provider generated a StorageException. Error report number: 638F688E-C6205-000027
Cause:
    The volume could not be found on the system.

drbdmon still shows the resource:

⤷RES: pvc-976cacaa-af84-4398-814d-d4745288e81a         ⚫Primary     QUORUM LOST
    ✗    0:   1008  UpToDate            
    ⤷↯ de-fra-node55                                    Connecting            Unknown   
    ⤷↯ de-fra-node56                                    Disconnecting         Unknown   
        ✗    0  Outdated             Off  

We are using Piraeus with Linstor 1.20 and DRBD 1.9.12 on Ubuntu 20.04 LTS. Before DRBD 1.9.12 we never saw a freezing drbdsetup call.

We experienced the same situation with another resource. The only way to get out of this situation was to reboot the node.