gluster/glusterd2

Multiple GD2 pods restarts and crashes during pvc deletion

ksandha opened this issue · 4 comments

  1. Started 1000 pvc creation and created about 816 pvc and hit with issue #1364 .

  2. started to delete the pvc's

  3. after some time the kube1 (master node) went into not ready state. and crashes seen in dmesg

[10353.595893] CPU: 7 PID: 28993 Comm: etcd Kdump: loaded Tainted: G               ------------ T 3.10.0-862.11.6.el7.x86_64 #1
[10353.600643] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[10353.608597] Call Trace:
[10353.609859]  [<ffffffffbd5135d4>] dump_stack+0x19/0x1b
[10353.611953]  [<ffffffffbd50e79f>] dump_header+0x90/0x229
[10353.613746]  [<ffffffffbd0dc63b>] ? cred_has_capability+0x6b/0x120
[10353.616039]  [<ffffffffbcf9ac64>] oom_kill_process+0x254/0x3d0
[10353.618083]  [<ffffffffbd0dc71e>] ? selinux_capable+0x2e/0x40
[10353.625154]  [<ffffffffbcf9b4a6>] out_of_memory+0x4b6/0x4f0
[10353.629104]  [<ffffffffbd50f2a3>] __alloc_pages_slowpath+0x5d6/0x724
[10353.632320]  [<ffffffffbcfa17f5>] __alloc_pages_nodemask+0x405/0x420
[10353.637419]  [<ffffffffbcfebf98>] alloc_pages_current+0x98/0x110
[10353.641149]  [<ffffffffbcf97057>] __page_cache_alloc+0x97/0xb0
[10353.644898]  [<ffffffffbcf99758>] filemap_fault+0x298/0x490
[10353.646917]  [<ffffffffc059085f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
[10353.649171]  [<ffffffffbcfc352a>] __do_fault.isra.58+0x8a/0x100
[10353.651093]  [<ffffffffbcfc3adc>] do_read_fault.isra.60+0x4c/0x1b0
[10353.653050]  [<ffffffffbcfc8484>] handle_pte_fault+0x2f4/0xd10
[10353.669942]  [<ffffffffbced1d5c>] ? try_to_wake_up+0x18c/0x350
[10353.677107]  [<ffffffffbcf9709b>] ? unlock_page+0x2b/0x30
[10353.682050]  [<ffffffffbcfcae3d>] handle_mm_fault+0x39d/0x9b0
[10353.685380]  [<ffffffffbd520557>] __do_page_fault+0x197/0x4f0
[10353.687253]  [<ffffffffbd520996>] trace_do_page_fault+0x56/0x150
[10353.693564]  [<ffffffffbd51ff22>] do_async_page_fault+0x22/0xf0
[10353.695466]  [<ffffffffbd51c788>] async_page_fault+0x28/0x30
[10353.698068] Mem-Info:
[10353.699814] active_anon:6293028 inactive_anon:9713 isolated_anon:0
 active_file:33 inactive_file:2222 isolated_file:60
 unevictable:0 dirty:3 writeback:0 unstable:0
  1. Logged in to gd2 pod and checked the top command to see the resource consumption:-
top - 09:37:47 up  2:30,  0 users,  load average: 69.93, 42.17, 27.13
Tasks: 398 total,   6 running, 392 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.2 us, 16.3 sy,  0.0 ni, 58.7 id,  1.0 wa,  0.0 hi,  0.5 si, 10.2 st
KiB Mem : 29987848 total,  4742472 free, 18293572 used,  6951804 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  9433120 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                   
 3153 root      20   0  209504 151188   2584 R  76.4  0.5   0:47.66 lvs                                                                                                                                                                       
 3245 root      20   0   13.6g   1.0g   3380 S  65.5  3.6   3:20.96 glusterfs                                                                                                                                                                 
 3147 root      20   0  203912 145704   2584 D  34.5  0.5   0:49.67 lvs                                                                                                                                                                       
16961 root      20   0   71328  12632   2128 D  18.9  0.0   0:01.93 lvs                                                                                                                                                                       
13131 root      20   0   71892  13196   2128 D  18.2  0.0   0:14.91 lvs                                                                                                                                                                       
   29 root      20   0 7286888   1.5g  11592 S  16.9  5.4   2:22.93 glusterd2                                                                                                                                                                 
 4687 root      20   0  203912 145696   2584 S  11.5  0.5   0:28.12 lvs                                                                                                                                                                       
 5217 root      20   0   71320  12704   2192 D   8.1  0.0   0:22.66 lvremove                                                                                                                                                                  
13809 root      20   0   71572  12884   2128 D   7.4  0.0   0:08.19 lvs                                                                                                                                                                       
17549 root      20   0   70556  11724   2128 D   7.4  0.0   0:00.92 lvs                                                                                                                                                                       
17562 root      20   0   70556  11736   2128 D   7.4  0.0   0:00.89 lvs                                                                                                                                                                       
12586 root      20   0   71912  13224   2128 D   6.8  0.0   0:17.77 lvs                                                                                                                                                                       
17533 root      20   0   70556  11784   2128 D   6.8  0.0   0:01.05 lvs                                                                                                                                                                       
12564 root      20   0   71912  13224   2128 D   6.1  0.0   0:17.93 lvs                                                                                                                                                                       
15707 root      20   0   71384  12688   2128 D   6.1  0.0   0:04.81 lvs                                                                                                                                                                       
16198 root      20   0   71352  12656   2128 D   6.1  0.0   0:03.82 lvs                                                                                                                                                                       
16984 root      20   0   71328  12632   2128 D   6.1  0.0   0:02.19 lvs                                                                                                                                                                       
13824 root      20   0   71600  12912   2128 D   5.4  0.0   0:09.20 lvs                                                                                                                                                                       
15002 root      20   0   71404  12712   2128 D   5.4  0.0   0:05.72 lvs                                                                                                                                                                       
16732 root      20   0   71320  12628   2124 D   5.4  0.0   0:02.32 lvremove                                                                                                                                                                  
16908 root      20   0   71328  12644   2128 D   5.4  0.0   0:02.25 lvs                                                                                                                                                                       
17577 root      20   0   70412  11688   2128 D   5.4  0.0   0:00.82 lvs                                                                                                                                                                       
 3149 root      20   0  203912 145704   2584 D   4.7  0.5   0:50.39 lvs                                                                                                                                                                       
13310 root      20   0   71804  13116   2128 D   4.7  0.0   0:12.33 lvs                                                                                                                                                                       
13668 root      20   0   71712  13028   2128 D   4.7  0.0   0:10.86 lvs                                                                                                                                                                       
15685 root      20   0   71352  12656   2128 D   4.7  0.0   0:03.98 lvs                                                                                                                                                                       
16508 root      20   0   71324  12632   2124 D   4.7  0.0   0:02.40 lvremove                                                                                                                                                                  
 7363 root      20   0   71328  12764   2248 D   4.1  0.0   0:22.09 lvs                                                                                                                                                                       
 9899 root      20   0   71936  13248   2128 D   4.1  0.0   0:19.52 lvs                                                                                                                                                                       
12792 root      20   0   71908  13248   2128 D   4.1  0.0   0:16.49 lvs                                                                                                                                                                       
13145 root      20   0   71836  13144   2128 D   4.1  0.0   0:13.95 lvs                                                                                                                                                                       
13147 root      20   0   71836  13148   2128 D   4.1  0.0   0:14.05 lvs                                                                                                                                                                       
14507 root      20   0   71572  12884   2128 D   4.1  0.0   0:08.13 lvs                                                                                                                                                                       
14516 root      20   0   71532  12844   2128 D   4.1  0.0   0:07.33 lvs                                                                                                                                                                       
14972 root      20   0   71504  12808   2128 R   4.1  0.0   0:06.32 lvs                                                                                                                                                                       
14989 root      20   0   71504  12808   2128 D   4.1  0.0   0:06.48 lvs                                                                                                                                                                       
16221 root      20   0   71328  12636   2128 D   4.1  0.0   0:03.09 lvs                                                                                                                                                                       
16246 root      20   0   71328  12636   2128 R   4.1  0.0   0:03.10 lvs                                                                                                                                                                       
16937 root      20   0   71328  12640   2128 D   4.1  0.0   0:02.19 lvs                                                                                                                                                                       
 7298 root      20   0   71328  12760   2248 D   3.4  0.0   0:21.41 lvs                                                                                                                                                                       
10157 root      20   0   72004  13312   2128 R   3.4  0.0   0:20.91 lvs                                                                                                                                                                       
12820 root      20   0   71908  13220   2128 D   3.4  0.0   0:16.54 lvs                                                                                                                                                                       
12858 root      20   0   71892  13204   2128 D   3.4  0.0   0:14.84 lvs                                                                                                                                                                       
13653 root      20   0   71712  13024   2128 D   3.4  0.0   0:10.92 lvs                                                                                                                                                                       
13669 root      20   0   71712  13020   2128 D   3.4  0.0   0:10.79 lvs                                                                                                                                                                       
13675 root      20   0   71712  13020   2128 D   3.4  0.0   0:10.80 lvs                                                                                                                                                                       
13815 root      20   0   71600  12972   2128 R   3.4  0.0   0:09.07 lvs                                                                                                                                                                       
13848 root      20   0   71704  13016   2128 D   3.4  0.0   0:09.55 lvs                                                                                                                                                                       
14502 root      20   0   71532  12844   2128 D   3.4  0.0   0:07.37 lvs                                                                                                                                                                       
15610 root      20   0   71384  12696   2128 D   3.4  0.0   0:04.73 lvs                                                                                                                                                                       
15656 root      20   0   71384  12692   2128 D   3.4  0.0   0:04.76 lvs                                                                                                                                                                       
16248 root      20   0   71328  12632   2128 D   3.4  0.0   0:03.03 lvs                                                                                                                                                                       
 7046 root      20   0   71328  12764   2248 D   2.7  0.0   0:22.09 lvs                                                                                                                                                                       
 7427 root      20   0   72004  13316   2128 D   2.7  0.0   0:21.07 lvs                                                                                                                                                                       
10228 root      20   0   71968  13280   2128 D   2.7  0.0   0:20.16 lvs                                                                                                                                                                       
12524 root      20   0   71912  13216   2128 D   2.7  0.0   0:17.76 lvs                                                                                                                                                                       
12569 root      20   0   71924  13236   2128 D   2.7  0.0   0:18.31 lvs                                                                                                                                                                       
[root@gluster-kube3-0 /]#
  1. The pods before deletions:-
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          128m
csi-nodeplugin-glusterfsplugin-9kzr4   2/2     Running   0          128m
csi-nodeplugin-glusterfsplugin-fbtjt   2/2     Running   0          128m
csi-nodeplugin-glusterfsplugin-skwmr   2/2     Running   0          128m
csi-provisioner-glusterfsplugin-0      3/3     Running   0          128m
etcd-cmgjtvbdzt                        1/1     Running   0          133m
etcd-d8mz7wbfwt                        1/1     Running   0          134m
etcd-operator-7cb5bd459b-dbjt7         1/1     Running   0          135m
etcd-wwggwrlq27                        1/1     Running   0          135m
gluster-kube1-0                        1/1     Running   2          133m
gluster-kube2-0                        1/1     Running   3          133m
gluster-kube3-0                        1/1     Running   3          133m
[vagrant@kube1 ~]$ 

  1. Pods after deletion
[vagrant@kube2 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS      RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running     0          179m
csi-nodeplugin-glusterfsplugin-9kzr4   2/2     Running     0          179m
csi-nodeplugin-glusterfsplugin-fbtjt   2/2     Running     0          179m
csi-nodeplugin-glusterfsplugin-skwmr   2/2     NodeLost    0          179m
csi-provisioner-glusterfsplugin-0      3/3     Running     0          179m
etcd-cmgjtvbdzt                        0/1     Completed   0          3h4m
etcd-d8mz7wbfwt                        1/1     Unknown     0          3h5m
etcd-operator-7cb5bd459b-dbjt7         1/1     Running     1          3h6m
etcd-wwggwrlq27                        0/1     Error       0          3h5m
gluster-kube1-0                        1/1     Unknown     2          3h3m
gluster-kube2-0                        1/1     Running     3          3h3m
gluster-kube3-0                        1/1     Running     12         3h3m

Observed behavior

multiple gd2 pods restarts

Expected/desired behavior

no pods restarts or etcd pods crashes should be there

Details on how to reproduce (minimal and precise)

1/1

Information about the environment:

  • Glusterd2 version used (e.g. v4.1.0 or master):
    [root@gluster-kube2-0 /]# rpm -qa | grep glusterd2
    glusterd2-5.0-0.dev.80.git5f8ec37.el7.x86_64
    [root@gluster-kube2-0 /]#

  • Operating system used:
    centos

  • Glusterd2 compiled from sources, as a package (rpm/deb), or container: container

  • Using External ETCD: (yes/no, if yes ETCD version): yes

  • If container, which container image: 1.13

  • Using kubernetes, openshift, or direct install: kubernetes

  • If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside:

Other useful information

  • glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
  • glusterd2 log files from all nodes (default /var/log/glusterd2/glusterd2.log)
  • ETCD configuration
  • Contents of uuid.toml from all nodes (default /var/lib/glusterd2/uuid.toml)
  • Output of statedump from any one of the node

Useful commands

  • To get glusterd2 version

[root@gluster-kube2-0 /]# glusterd2 --version
glusterd version: v6.0-dev.80.git5f8ec37
git SHA: 5f8ec37
go version: go1.11.2
go OS/arch: linux/amd64
[root@gluster-kube2-0 /]#

```
  • To get ETCD version
    etcd --version
    
  • To get output of statedump
    curl http://glusterd2-IP:glusterd2-Port/statedump
    

dmesg.log

other logs placed at rhsqa-virt05.lab.eng.blr.redhat.com

I have requested @harigowtham to run the profiler for PV delete to see if there're any obvious leaks in GD2 transaction. If that doesn't give us any clue we should further inspect it on the etcd side, a good test to run is by running an external etcd cluster to see if we're hitting this bottleneck or not.

I checked it on a stand alone GD2 setup with external etcd by creating 15 volumes and deleting it later.

while creating, the memory kept growing up to a certain extent which is expected. And after deletion, the memory consumption did get back to normal.

I don't see any unusual memory consumption during delete and also i don't see any left over volumes in this process.

This should be fixed through #1453