Portworx LVM script may choose the wrong disks
Opened this issue · 0 comments
displague commented
Here are some of the inconsistencies:
- 2 out of 3 nodes did not create a LVM for the PX KVDB. Only one node successfully did it.
- The 3rd node that did create the pxw_vg, picked up the larger 480GB drive instead of the 240GB.
I am including some of the snippets.
This is worker node 1 where it could not create the pwx_vg and you can clearly see the warning message.
root@equinix-metal-gke-cluster-yk9or-worker-01:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1
IP: 145.40.77.105
Local Storage Pool: 2 pools
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 447 GiB 10 GiB Online default default
1 HIGH raid0 224 GiB 10 GiB Online default default
Local Storage Devices: 2 devices
Device Path Media Type Size Last-Scan
0:1 /dev/sdb STORAGE_MEDIUM_SSD 447 GiB 12 Feb 21 17:34 UTC
1:1 /dev/sdc STORAGE_MEDIUM_SSD 224 GiB 12 Feb 21 17:34 UTC
* Internal kvdb on this node is sharing this storage device /dev/sdc to store its data.
total - 671 GiB
Cache Devices:
* No cache devices
Cluster Summary
Cluster ID: equinix-metal-gke-cluster-yk9or
Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
Scheduler: kubernetes
Nodes: 3 node(s) with storage (3 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
145.40.77.101 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc equinix-metal-gke-cluster-yk9or-worker-03 Yes 20 GiB 671 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.211 99a6f578-6c6f-4b09-b516-8dd332beef7e equinix-metal-gke-cluster-yk9or-worker-02 Yes 20 GiB 668 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.105 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1 equinix-metal-gke-cluster-yk9or-worker-01 Yes 20 GiB 671 GiOnline Up (This node) 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
Warnings:
WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105]. This configuration is not recommended for production clusters.
Global Storage Pool
Total Used : 60 GiB
Total Capacity : 2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-01:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
sdb 8:16 0 447.1G 0 disk
sdc 8:32 0 223.6G 0 disk
sdd 8:48 0 223.6G 0 disk
├─sdd1 8:49 0 2M 0 part
├─sdd2 8:50 0 1.9G 0 part
└─sdd3 8:51 0 221.7G 0 part /
root@equinix-metal-gke-cluster-yk9or-worker-01:~#
This worker node 2 where there is no pwx_vg for the KVDB.
root@equinix-metal-gke-cluster-yk9or-worker-02:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 99a6f578-6c6f-4b09-b516-8dd332beef7e
IP: 145.40.77.211
Local Storage Pool: 2 pools
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 447 GiB 10 GiB Online default default
1 HIGH raid0 221 GiB 10 GiB Online default default
Local Storage Devices: 2 devices
Device Path Media Type Size Last-Scan
0:1 /dev/sdb STORAGE_MEDIUM_SSD 447 GiB 12 Feb 21 17:47 UTC
1:1 /dev/sdc2 STORAGE_MEDIUM_SSD 221 GiB 12 Feb 21 17:47 UTC
* Internal kvdb on this node is sharing this storage device /dev/sdc2 to store its data.
total - 668 GiB
Cache Devices:
* No cache devices
Journal Device:
1 /dev/sdc1 STORAGE_MEDIUM_SSD
Cluster Summary
Cluster ID: equinix-metal-gke-cluster-yk9or
Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
Scheduler: kubernetes
Nodes: 3 node(s) with storage (3 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
145.40.77.101 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc equinix-metal-gke-cluster-yk9or-worker-03 Yes 20 GiB 671 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.211 99a6f578-6c6f-4b09-b516-8dd332beef7e equinix-metal-gke-cluster-yk9or-worker-02 Yes 20 GiB 668 GiOnline Up (This node) 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.105 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1 equinix-metal-gke-cluster-yk9or-worker-01 Yes 20 GiB 671 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
Warnings:
WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105 145.40.77.211]. This configuration is not recommended for production clusters.
Global Storage Pool
Total Used : 60 GiB
Total Capacity : 2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-02:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
sdb 8:16 0 447.1G 0 disk
sdc 8:32 0 223.6G 0 disk
├─sdc1 8:33 0 3G 0 part
└─sdc2 8:34 0 220.6G 0 part
sdd 8:48 0 223.6G 0 disk
├─sdd1 8:49 0 2M 0 part
├─sdd2 8:50 0 1.9G 0 part
└─sdd3 8:51 0 221.7G 0 part /
root@equinix-metal-gke-cluster-yk9or-worker-02:~#
Finally this is worker node 3. This node creates the pwx_vg on the larger capacity drive.
root@equinix-metal-gke-cluster-yk9or-worker-03:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc
IP: 145.40.77.101
Local Storage Pool: 2 pools
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 447 GiB 10 GiB Online default default
1 HIGH raid0 224 GiB 10 GiB Online default default
Local Storage Devices: 2 devices
Device Path Media Type Size Last-Scan
0:1 /dev/sdb STORAGE_MEDIUM_SSD 447 GiB 12 Feb 21 17:34 UTC
1:1 /dev/sdc STORAGE_MEDIUM_SSD 224 GiB 12 Feb 21 17:34 UTC
total - 671 GiB
Cache Devices:
* No cache devices
Kvdb Device:
Device Path Size
/dev/pwx_vg/pwxkvdb 447 GiB
* Internal kvdb on this node is using this dedicated kvdb device to store its data.
Cluster Summary
Cluster ID: equinix-metal-gke-cluster-yk9or
Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
Scheduler: kubernetes
Nodes: 3 node(s) with storage (3 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
145.40.77.101 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc equinix-metal-gke-cluster-yk9or-worker-03 Yes 20 GiB 671 GiOnline Up (This node) 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.211 99a6f578-6c6f-4b09-b516-8dd332beef7e equinix-metal-gke-cluster-yk9or-worker-02 Yes 20 GiB 668 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
145.40.77.105 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1 equinix-metal-gke-cluster-yk9or-worker-01 Yes 20 GiB 671 GiOnline Up 2.6.3.0-4419aa4 5.4.0-52-generic Ubuntu 20.04.1 LTS
Global Storage Pool
Total Used : 60 GiB
Total Capacity : 2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-03:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
└─pwx_vg-pwxkvdb 253:0 0 447.1G 0 lvm
sdb 8:16 0 447.1G 0 disk
sdc 8:32 0 223.6G 0 disk
sdd 8:48 0 223.6G 0 disk
├─sdd1 8:49 0 2M 0 part
├─sdd2 8:50 0 1.9G 0 part
└─sdd3 8:51 0 221.7G 0 part /
root@equinix-metal-gke-cluster-yk9or-worker-03:~#
Any thoughts regarding these inconsistencies.
Originally posted by @bikashrc25 in #37 (comment)