openebs-archive/device-localpv

Runtime Panic Within Node Plugin

iyashu opened this issue · 10 comments

We observed a go runtime panic within device local node plugin. Just putting out here to see if anyone can take look into it.

2021-06-07T16:57:10.817343241+05:30 stderr F I0607 16:57:10.817191       1 device-util.go:170] GetPart Error, {DiskName:sda Size:10737418240}
2021-06-07T16:57:10.91741737+05:30 stderr F I0607 16:57:10.917210       1 device-util.go:288] Disk: DiskName not correct
2021-06-07T16:57:10.917434131+05:30 stderr F I0607 16:57:10.917232       1 device-util.go:170] GetPart Error, {DiskName:sda Size:10737418240}
2021-06-07T16:57:13.019507828+05:30 stderr F I0607 16:57:13.019321       1 device-util.go:288] Disk: DiskName not correct
2021-06-07T16:57:13.019534179+05:30 stderr F I0607 16:57:13.019352       1 device-util.go:170] GetPart Error, {DiskName:sdd Size:3189013217280}
2021-06-07T16:57:13.019604142+05:30 stderr F I0607 16:57:13.019531       1 device-util.go:288] Disk: DiskName not correct
2021-06-07T16:57:13.019619593+05:30 stderr F I0607 16:57:13.019555       1 device-util.go:170] GetPart Error, {DiskName:sdd Size:3189013217280}
2021-06-07T16:57:13.618848367+05:30 stderr F I0607 16:57:13.618659       1 device-util.go:288] Disk: DiskName not correct
2021-06-07T16:57:13.618869735+05:30 stderr F I0607 16:57:13.618686       1 device-util.go:170] GetPart Error, {DiskName:sde Size:12000136527872}
2021-06-07T16:57:13.618873316+05:30 stderr F I0607 16:57:13.618697       1 device-util.go:107] Running WipeFS for Partition sdb 2
2021-06-07T16:57:13.919818946+05:30 stderr F I0607 16:57:13.919580       1 device-util.go:288] Disk: DiskName not correct
2021-06-07T16:57:13.919837104+05:30 stderr F I0607 16:57:13.919603       1 device-util.go:170] GetPart Error, {DiskName:sde Size:12000136527872}
2021-06-07T16:57:13.919840384+05:30 stderr F E0607 16:57:13.919664       1 runtime.go:73] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
2021-06-07T16:57:13.919843152+05:30 stderr F goroutine 103 [running]:
2021-06-07T16:57:13.919845434+05:30 stderr F k8s.io/apimachinery/pkg/util/runtime.logPanic(0x14b5ca0, 0xc000393a80)
2021-06-07T16:57:13.91985613+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:69 +0x7b
2021-06-07T16:57:13.919858808+05:30 stderr F k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
2021-06-07T16:57:13.919861079+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 +0x89
2021-06-07T16:57:13.919863579+05:30 stderr F panic(0x14b5ca0, 0xc000393a80)
2021-06-07T16:57:13.919865765+05:30 stderr F 	/usr/local/opt/go/libexec/src/runtime/panic.go:969 +0x1b9
2021-06-07T16:57:13.919868366+05:30 stderr F github.com/openebs/device-localpv/pkg/device.wipefsAndCreatePart(0xc000cc8cd5, 0x3, 0x2, 0xc0004ac2a4, 0x24, 0x8a4000, 0xc000d085e0, 0xa, 0x0, 0x1)
2021-06-07T16:57:13.919870885+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/device/device-util.go:107 +0x72f
2021-06-07T16:57:13.919873027+05:30 stderr F github.com/openebs/device-localpv/pkg/device.CreateVolume(0xc00073e780, 0x1, 0xc00073e780)
2021-06-07T16:57:13.91987519+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/device/device-util.go:89 +0x3fd
2021-06-07T16:57:13.919877345+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).syncVol(0xc00048cae0, 0xc00073e780, 0x0, 0x0)
2021-06-07T16:57:13.919879533+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:91 +0x4b
2021-06-07T16:57:13.919882012+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).syncHandler(0xc00048cae0, 0xc0005a2340, 0x37, 0xc00083ddb0, 0x11e8954)
2021-06-07T16:57:13.919884183+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:57 +0x13d
2021-06-07T16:57:13.919886428+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).processNextWorkItem.func1(0xc00048cae0, 0x1362b20, 0xc0006ea560, 0x0, 0x0)
2021-06-07T16:57:13.919888691+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:230 +0xd7
2021-06-07T16:57:13.91989105+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).processNextWorkItem(0xc00048cae0, 0x1)
2021-06-07T16:57:13.919893251+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:240 +0x4d
2021-06-07T16:57:13.91989537+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).runWorker(0xc00048cae0)
2021-06-07T16:57:13.919897546+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:191 +0x2b
2021-06-07T16:57:13.91989968+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0006a24f0)
2021-06-07T16:57:13.919901849+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5f
2021-06-07T16:57:13.919903992+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0006a24f0, 0x3b9aca00, 0x0, 0x1, 0xc00011c780)
2021-06-07T16:57:13.919906101+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0x105
2021-06-07T16:57:13.91990823+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.Until(0xc0006a24f0, 0x3b9aca00, 0xc00011c780)
2021-06-07T16:57:13.9199107+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
2021-06-07T16:57:13.919912858+05:30 stderr F created by github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).Run
2021-06-07T16:57:13.919914971+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:177 +0x265
2021-06-07T16:57:13.921975666+05:30 stderr F panic: runtime error: index out of range [0] with length 0 [recovered]
2021-06-07T16:57:13.921990187+05:30 stderr F 	panic: runtime error: index out of range [0] with length 0
2021-06-07T16:57:13.922018166+05:30 stderr F 
2021-06-07T16:57:13.922022345+05:30 stderr F goroutine 103 [running]:
2021-06-07T16:57:13.922025998+05:30 stderr F k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
2021-06-07T16:57:13.922030533+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x10c
2021-06-07T16:57:13.922035537+05:30 stderr F panic(0x14b5ca0, 0xc000393a80)
2021-06-07T16:57:13.922039044+05:30 stderr F 	/usr/local/opt/go/libexec/src/runtime/panic.go:969 +0x1b9
2021-06-07T16:57:13.922046513+05:30 stderr F github.com/openebs/device-localpv/pkg/device.wipefsAndCreatePart(0xc000cc8cd5, 0x3, 0x2, 0xc0004ac2a4, 0x24, 0x8a4000, 0xc000d085e0, 0xa, 0x0, 0x1)
2021-06-07T16:57:13.922050847+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/device/device-util.go:107 +0x72f
2021-06-07T16:57:13.922054326+05:30 stderr F github.com/openebs/device-localpv/pkg/device.CreateVolume(0xc00073e780, 0x1, 0xc00073e780)
2021-06-07T16:57:13.922057886+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/device/device-util.go:89 +0x3fd
2021-06-07T16:57:13.922264779+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).syncVol(0xc00048cae0, 0xc00073e780, 0x0, 0x0)
2021-06-07T16:57:13.922272982+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:91 +0x4b
2021-06-07T16:57:13.922278953+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).syncHandler(0xc00048cae0, 0xc0005a2340, 0x37, 0xc00083ddb0, 0x11e8954)
2021-06-07T16:57:13.922283195+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:57 +0x13d
2021-06-07T16:57:13.922287216+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).processNextWorkItem.func1(0xc00048cae0, 0x1362b20, 0xc0006ea560, 0x0, 0x0)
2021-06-07T16:57:13.922291221+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:230 +0xd7
2021-06-07T16:57:13.922295171+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).processNextWorkItem(0xc00048cae0, 0x1)
2021-06-07T16:57:13.922299196+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:240 +0x4d
2021-06-07T16:57:13.922303001+05:30 stderr F github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).runWorker(0xc00048cae0)
2021-06-07T16:57:13.92230684+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:191 +0x2b
2021-06-07T16:57:13.922310849+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0006a24f0)
2021-06-07T16:57:13.922314835+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5f
2021-06-07T16:57:13.922318811+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0006a24f0, 0x3b9aca00, 0x0, 0x1, 0xc00011c780)
2021-06-07T16:57:13.922322745+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0x105
2021-06-07T16:57:13.922326806+05:30 stderr F k8s.io/apimachinery/pkg/util/wait.Until(0xc0006a24f0, 0x3b9aca00, 0xc00011c780)
2021-06-07T16:57:13.922330838+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
2021-06-07T16:57:13.92233473+05:30 stderr F created by github.com/openebs/device-localpv/pkg/mgmt/volume.(*VolController).Run
2021-06-07T16:57:13.922338818+05:30 stderr F 	/Users/praveen.gt/go/src/github.com/openebs/device-localpv/pkg/mgmt/volume/volume.go:177 +0x265

@praveengt @iyashu Can you also provide the output of lsblk -b and parted /dev/sdX unit b print, for device sda, sdb, sdd and sde

Also, which is the version of device-driver that you are running?

@akhilerm it happened once as of now and since the pod have been restarted we don't have the state of partition table during that error time interval. As per initial logs of device localpv, we are using plugin at 1cae175 commit id (but i think there is no major functional code changes happened after this commit id). Let me know if you still want get latest output of parted command (doesn't seems to be relevant here). Here is the output of lsblk to understand the disk & partitions created on the node bit better.

NAME                            MAJ:MIN RM           SIZE RO TYPE MOUNTPOINT
sda                               8:0    0    10737418240  0 disk
|-sda1                            8:1    0    10603183616  0 part /
|-sda14                           8:14   0        3145728  0 part
`-sda15                           8:15   0      130023424  0 part /boot
sdb                               8:16   0 12000136527872  0 disk
|-sdb1                            8:17   0        1048576  0 part
`-sdb2                            8:18   0  9500467658752  0 part /var/lib/kubelet/pods/9c5a96b7-ca07-4d92-9cc8-ad3ca9e2c7ab/volumes/kubernetes.io~csi/pvc-40e33246-3bee-461d-824b-e0ccec803911/mount
sdc                               8:32   0 12000136527872  0 disk
|-sdc1                            8:33   0        1048576  0 part
`-sdc2                            8:34   0  9500467658752  0 part /var/lib/kubelet/pods/9c5a96b7-ca07-4d92-9cc8-ad3ca9e2c7ab/volumes/kubernetes.io~csi/pvc-5d8d56cb-e291-4dfd-81ac-fb664dd5ec75/mount
sdd                               8:48   0  3189013217280  0 disk
`-sdd1                            8:49   0  3189011120128  0 part
sde                               8:64   0 12000136527872  0 disk
|-sde1                            8:65   0  1318553911296  0 part
| |-abc                  254:0    0   483183820800  0 lvm  /var/lib/abc
| |-xyz                  254:1    0   161061273600  0 lvm  /var/lib/xyz
| `-pqr                 254:2    0   483183820800  0 lvm  /var/lib/pqr
`-sde2                            8:66   0 10681580519424  0 part

This is the latest version. The issue is because of the pvc name is not
state. Please check the parameter value for the pvc name in the call stack.
It is not pointing to an address.

@praveengt the backtrace is suggesting that node daemonset is crashing in wipefsAndCreatePart. It seems like getAllPartsUsed returned an emplty list and we are trying to access the first entry which is not present. We are getting the vol name and I can see in the backtrace that the length is 36 (see this => 0xc0004ac2a4, 0x24).

@akhilerm, We should be checking if pList is not empty then only should proceed here https://github.com/openebs/device-localpv/blob/develop/pkg/device/device-util.go#L108.

@akhilerm one thing we need to see that why we got this error

2021-06-07T16:57:10.817343241+05:30 stderr F I0607 16:57:10.817191       1 device-util.go:170] GetPart Error, {DiskName:sda Size:10737418240}
2021-06-07T16:57:10.91741737+05:30 stderr F I0607 16:57:10.917210       1 device-util.go:288] Disk: DiskName not correct

This is leading to empty partition list and causing the crash.

@iyashu , I asked about the version, because there is a difference in the line numbers mentioned in the logs and commit 1cae175

I will add a check to make sure that the pList[] is not empty before proceeding.

Also, @praveengt the name is used for the first time here while logging which crashes while using pList[0]

Isn't Third parameter should be a string, I dont know how backtrace in
Golang is built,

string is passed as an address + length , so corresponding to string there will be 2 parameters in the stack trace.
Also, the last 0x0, 0x1 in the parameters specify the return value. Since the function returns an error, which is an interface, we have 2 values here.