equinix/terraform-equinix-metal-anthos-on-baremetal

Portworx LVM script may choose the wrong disks

Opened this issue · 0 comments

Here are some of the inconsistencies:

  1. 2 out of 3 nodes did not create a LVM for the PX KVDB. Only one node successfully did it.
  2. The 3rd node that did create the pxw_vg, picked up the larger 480GB drive instead of the 240GB.

I am including some of the snippets.

This is worker node 1 where it could not create the pwx_vg and you can clearly see the warning message.

root@equinix-metal-gke-cluster-yk9or-worker-01:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1
    IP: 145.40.77.105 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       224 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:34 UTC
    1:1 /dev/sdc    STORAGE_MEDIUM_SSD  224 GiB     12 Feb 21 17:34 UTC
    * Internal kvdb on this node is sharing this storage device /dev/sdc  to store its data.
    total       -   671 GiB
    Cache Devices:
     * No cache devices
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    Warnings: 
         WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105]. This configuration is not recommended for production clusters.
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-01:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 447.1G  0 disk 
sdb      8:16   0 447.1G  0 disk 
sdc      8:32   0 223.6G  0 disk 
sdd      8:48   0 223.6G  0 disk 
├─sdd1   8:49   0     2M  0 part 
├─sdd2   8:50   0   1.9G  0 part 
└─sdd3   8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-01:~# 

This worker node 2 where there is no pwx_vg for the KVDB.

root@equinix-metal-gke-cluster-yk9or-worker-02:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 99a6f578-6c6f-4b09-b516-8dd332beef7e
    IP: 145.40.77.211 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       221 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:47 UTC
    1:1 /dev/sdc2   STORAGE_MEDIUM_SSD  221 GiB     12 Feb 21 17:47 UTC
    * Internal kvdb on this node is sharing this storage device /dev/sdc2  to store its data.
    total       -   668 GiB
    Cache Devices:
     * No cache devices
    Journal Device: 
    1   /dev/sdc1   STORAGE_MEDIUM_SSD
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    Warnings: 
         WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105 145.40.77.211]. This configuration is not recommended for production clusters.
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-02:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 447.1G  0 disk 
sdb      8:16   0 447.1G  0 disk 
sdc      8:32   0 223.6G  0 disk 
├─sdc1   8:33   0     3G  0 part 
└─sdc2   8:34   0 220.6G  0 part 
sdd      8:48   0 223.6G  0 disk 
├─sdd1   8:49   0     2M  0 part 
├─sdd2   8:50   0   1.9G  0 part 
└─sdd3   8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-02:~# 

Finally this is worker node 3. This node creates the pwx_vg on the larger capacity drive.

root@equinix-metal-gke-cluster-yk9or-worker-03:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc
    IP: 145.40.77.101 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       224 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:34 UTC
    1:1 /dev/sdc    STORAGE_MEDIUM_SSD  224 GiB     12 Feb 21 17:34 UTC
    total           -           671 GiB
    Cache Devices:
     * No cache devices
    Kvdb Device:
    Device Path     Size
    /dev/pwx_vg/pwxkvdb 447 GiB
     * Internal kvdb on this node is using this dedicated kvdb device to store its data.
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-03:~# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0 447.1G  0 disk 
└─pwx_vg-pwxkvdb 253:0    0 447.1G  0 lvm  
sdb                8:16   0 447.1G  0 disk 
sdc                8:32   0 223.6G  0 disk 
sdd                8:48   0 223.6G  0 disk 
├─sdd1             8:49   0     2M  0 part 
├─sdd2             8:50   0   1.9G  0 part 
└─sdd3             8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-03:~# 

Any thoughts regarding these inconsistencies.

Originally posted by @bikashrc25 in #37 (comment)