portworx/px-dev

Breach of DevOps data in single node configuration with px-dev

Closed this issue · 5 comments

Hi all,
just after rebooting of devops node (srv-wh) we are stuck with amazing pxctl status:

[root@srv-wh ~]# /opt/pwx/bin/pxctl status
Status: PX storage down
License: PX-Developer
Node ID: 5923cb4f-7878-4de1-962e-728b1f3f371e
IP: 172.16.0.141
Local Storage Pool: 1 pool
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 1.5 TiB 1.4 TiB Offline default default
Local Storage Devices: 1 device
Device Path Media Type Size Last-Scan
0:1 /dev/mapper/ol_c0242-multivers STORAGE_MEDIUM_MAGNETIC 1.5 TiB 23 Apr 19 19:23 MSK
total - 1.5 TiB
Cluster Summary
Cluster ID: wh-devops-1
Cluster UUID: 4b7925d3-1996-4a87-b93a-9731defb8076
Scheduler:
Nodes: 1 node(s) with storage (0 online)
IP ID StorageNode Used Capacity Status StorageStatus Version Kernel OS
172.16.0.141 5923cb4f-7878-4de1-962e-728b1f3f371e Yes 1.4 TiB 1.5 TiB Online (StorageDown) Full or Offline (This node) 1.3.4.0-7895900 4.14.35-1844.0.7.el7uek.x86_64 Oracle Linux Server 7.6
Global Storage Pool
Total Used : 1.4 TiB
Total Capacity : 1.5 TiB
[root@srv-wh ~]#

and usual alert:

[root@srv-wh ~]# /opt/pwx/bin/pxctl service alerts a | grep "Drive state change"
325 DRIVE 5923cb4f-7878-4de1-962e-728b1f3f371e Apr 23 10:59:08 UTC 2019 WARN Drive state change Free disk space (100.0 GiB available of 1.46 TiB) is below recommended level on this node for pool 0. This pool will transition to offline mode, however I/O on replicated volumes will continue on replicated nodes.

It would be nice to add disks into storage pool:

[root@srv-wh ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 3,9G 0 rom
sda 8:0 0 2T 0 disk
├─sda2 8:2 0 2T 0 part
│ ├─ol_c0242-swap 252:1 0 7,7G 0 lvm [SWAP]
│ ├─ol_c0242-var 252:4 0 300G 0 lvm /var
│ ├─ol_c0242-multivers 252:2 0 1,6T 0 lvm
│ ├─ol_c0242-root 252:0 0 50G 0 lvm /
│ └─ol_c0242-home 252:3 0 30G 0 lvm /home
└─sda1 8:1 0 500M 0 part /boot
vda 250:0 0 500G 0 disk
└─vda1 250:1 0 500G 0 part

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive add --drive /dev/vda1
Operation requires PX to be in maintenance mode.
[root@srv-wh ~]#

[root@srv-wh ~]# /opt/pwx/bin/pxctl service maintenance -e
This is a disruptive operation, PX will restart in maintenance mode.
Are you sure you want to proceed ? (Y/N): y
Entering maintenance mode...
[root@srv-wh ~]#

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive add --drive /dev/vda1
Drive add done
[root@srv-wh ~]#

[root@srv-wh ~]# /opt/pwx/bin/pxctl status
Status: PX storage down
License: PX-Developer
Node ID: 5923cb4f-7878-4de1-962e-728b1f3f371e
IP: 172.16.0.141
Local Storage Pool: 2 pools
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 1.5 TiB 1.4 TiB Offline default default
1 HIGH raid0 500 GiB 3.0 GiB Online default default
Local Storage Devices: 2 devices
Device Path Media Type Size Last-Scan
0:1 /dev/mapper/ol_c0242-multivers STORAGE_MEDIUM_MAGNETIC 1.5 TiB 23 Apr 19 19:58 MSK
1:1 /dev/vda1 STORAGE_MEDIUM_MAGNETIC 500 GiB 23 Apr 19 19:58 MSK
total - 2.0 TiB
Cluster Summary
Cluster ID: wh-devops-1
Cluster UUID: 4b7925d3-1996-4a87-b93a-9731defb8076
Scheduler:
Nodes: 1 node(s) with storage (0 online)
IP ID StorageNode Used Capacity Status StorageStatus Version Kernel OS
172.16.0.141 5923cb4f-7878-4de1-962e-728b1f3f371e Yes 1.4 TiB 2.0 TiB Online (StorageDown) Some pools Full or Offline (This node) 1.3.4.0-7895900 4.14.35-1844.0.7.el7uek.x86_64 Oracle Linux Server 7.6
Global Storage Pool
Total Used : 1.4 TiB
Total Capacity : 2.0 TiB
[root@srv-wh ~]#

no luck :)

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive show
PX drive configuration:
Pool ID: 0
IO Priority: HIGH
Labels:
Size: 1.5 TiB
Status: Offline
Has meta data: Yes
Drives:
1: /dev/mapper/ol_c0242-multivers, 1.4 TiB allocated of 1.5 TiB, Online
Pool ID: 1
IO Priority: HIGH
Labels:
Size: 500 GiB
Status: Online
Has meta data: No
Drives:
1: /dev/vda1, 3.0 GiB allocated of 500 GiB, Online
[root@srv-wh ~]#

lvextend -l +100%FREE /dev/mapper/ol_c0242-multivers

/opt/pwx/bin/pxctl service maintenance -e
/opt/pwx/bin/pxctl service maintenance -x

no luck again :)

What do you recommend?

WBR,
Vitaly

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive replace --source /dev/mapper/ol_c0242-multivers --target /dev/vdb1 --operation start
Replace drive start failed. drive size 2199022206464 too big, pool size 1610612736000
[root@srv-wh ~]#
no luck...

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive rebalance --poolID 0 --operation status
Done: Pool 0: Storage rebalance is running. 102 out of about 1402 chunks balanced (103 considered), 93% left
[root@srv-wh ~]#

[root@srv-wh ~]# /opt/pwx/bin/pxctl service drive show
PX drive configuration:
Pool ID: 0
IO Priority: HIGH
Labels:
Size: 1.5 TiB
Status: Offline
Has meta data: Yes
Drives:
1: /dev/mapper/ol_c0242-multivers, 1.4 TiB allocated of 1.5 TiB, Online
Pool ID: 1
IO Priority: HIGH
Labels:
Size: 500 GiB
Status: Online
Has meta data: No
Drives:
1: /dev/vda1, 3.0 GiB allocated of 500 GiB, Online
[root@srv-wh ~]#
still no.

As discussed in the other threads, the way to increase the size of a pool is to add new drives of similar size.

[root@srv-wh ~]# lsblk --bytes
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb 250:16 0 2199023255552 0 disk
└─pwx1-multivers01 252:5 0 1730670493696 0 lvm
sr0 11:0 1 4172283904 0 rom
sda 8:0 0 2147483648000 0 disk
├─sda2 8:2 0 2146958311424 0 part
│ ├─ol_c0242-swap 252:1 0 8262778880 0 lvm [SWAP]
│ ├─ol_c0242-var 252:4 0 322122547200 0 lvm /var
│ ├─ol_c0242-multivers 252:2 0 1730670493696 0 lvm
│ ├─ol_c0242-root 252:0 0 53687091200 0 lvm /
│ └─ol_c0242-home 252:3 0 32212254720 0 lvm /home
└─sda1 8:1 0 524288000 0 part /boot
vda 250:0 0 536870912000 0 disk
└─vda1 250:1 0 536869863424 0 part
[root@srv-wh ~]#

/opt/pwx/bin/pxctl service drive add --drive /dev/mapper/pwx1-multivers01

/opt/pwx/bin/pxctl service maintenance -x

[root@srv-wh ~]# /opt/pwx/bin/pxctl status
Status: PX is initializing...
License: PX-Developer
Node ID: 5923cb4f-7878-4de1-962e-728b1f3f371e
IP: 172.16.0.141
Local Storage Pool: 3 pools
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 1.5 TiB 1.4 TiB Online default default
1 HIGH raid0 500 GiB 3.1 GiB Online default default
2 MEDIUM raid0 1.6 TiB 0 B Online default default
Local Storage Devices: 3 devices
Device Path Media Type Size Last-Scan
0:1 /dev/mapper/ol_c0242-multivers STORAGE_MEDIUM_MAGNETIC 1.5 TiB 24 Apr 19 09:10 MSK
1:1 /dev/vda1 STORAGE_MEDIUM_MAGNETIC 500 GiB 24 Apr 19 09:10 MSK
2:1 /dev/mapper/pwx1-multivers01 STORAGE_MEDIUM_MAGNETIC 1.6 TiB 24 Apr 19 11:57 MSK
total - 3.5 TiB
Cluster Summary
Cluster ID: wh-devops-1
Cluster UUID: 4b7925d3-1996-4a87-b93a-9731defb8076
Scheduler:
Nodes: 1 node(s) with storage (0 online)
IP ID StorageNode Used Capacity Status StorageStatus Version Kernel OS
172.16.0.141 5923cb4f-7878-4de1-962e-728b1f3f371e Yes 0 B 0 B Initializing Down (This node) 1.3.4.0-7895900 4.14.35-1844.0.7.el7uek.x86_64 Oracle Linux Server 7.6
Global Storage Pool
Total Used : 0 B
Total Capacity : 0 B
[root@srv-wh ~]#

no luck, again.

Bingo!
[root@srv-wh ~]# dd if=/dev/mapper/ol_c0242-multivers bs=512 skip=128 count=1 | od -a
0000000 ; fs x O nul nul nul nul nul nul nul nul nul nul nul nul
0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000040 syn M r bs " P I < 9 si K dc3 etb nak ] |
0000060 nul nul soh nul nul nul nul nul soh nul nul nul nul nul nul nul
0000100 _ B H R f S _ M enq ^ cr soh nul nul nul nul
0000120 nul nul z ff ; stx nul nul nul @ @ A del etx nul nul

.... #define BTRFS_MAGIC 0x4D5F53665248425FULL /* ascii _BHRfS_M, no null */

#mount /dev/mapper/ol_c0242-multivers -o loop /mnt/1/
#mount /mnt/1/957918760219237782/pxdev -o loop /u01
#mount /mnt/1/205057439761719123/pxdev -o loop /u02

ls /u02/oradata/rep_ext/

EXPDAT01-09_49_48.DMP EXPDAT-09_49_48.LOG EXPDAT-12:44:56_28-03-2019-01.DMP EXPDAT201.DMP EXPDAT28.03.19.LOG EXPDAT2.LOG EXPDAT3.LOG EXPDP-12:44:55_28-03-2019.LOG
EXPDAT01.DMP EXPDAT101.DMP EXPDAT1.LOG EXPDAT28.03.1901.DMP EXPDAT-28-03-2019_12:28:23-01.DMP EXPDAT301.DMP EXPDAT.LOG EXPDP-28-03-2019_12:28:23.LOG
[root@srv-wh ~]#