openvstorage/alba

Observation: TCP TIME_WAIT storm from maintenance and proxy

Closed this issue · 3 comments

When troubleshooting an environment @ GIG we noticed that there was a TCP TIME_WAIT storm upcoming from the maintenancer (on 1 node) and the proxy (on 2 nodes).

For example: the amount of TIME_WAIT's on a node dropped from 70k to 2k after killing the maintenancer.

root@stor-01:/usr/share/bcc/tools# dpkg -l | grep alba
ii alba 1.3.7 amd64 the ALternative BAckend

Maybe related, on my env there's lots of short living connections

# /usr/share/bcc/tools/tcplife -D 26404
PID   COMM       LADDR           LPORT RADDR           RPORT TX_KB RX_KB MS
21534 alba       172.22.186.32   40002 172.22.186.34   26404     0     0 0.68
21534 alba       172.22.186.32   58768 172.22.186.33   26404     0     0 0.46
21534 alba       172.22.186.32   58770 172.22.186.33   26404     0     0 4.17
21534 alba       172.22.186.32   40010 172.22.186.34   26404     0     0 0.42
21534 alba       172.22.186.32   58776 172.22.186.33   26404     0     0 0.44
21534 alba       172.22.186.32   58778 172.22.186.33   26404     0     0 55.08
21534 alba       172.22.186.32   40018 172.22.186.34   26404     0     0 0.61
21534 alba       172.22.186.32   58784 172.22.186.33   26404     0     0 0.56
21534 alba       172.22.186.32   58786 172.22.186.33   26404     0     0 12.85
21534 alba       172.22.186.32   40026 172.22.186.34   26404     0     0 0.58
21534 alba       172.22.186.32   58792 172.22.186.33   26404     0     0 0.99
21534 alba       172.22.186.32   58794 172.22.186.33   26404     0     0 112.14
21534 alba       172.22.186.32   40034 172.22.186.34   26404     0     0 0.55
21534 alba       172.22.186.32   58800 172.22.186.33   26404     0     0 0.85
21534 alba       172.22.186.32   58802 172.22.186.33   26404     0     0 20.48
21534 alba       172.22.186.32   40042 172.22.186.34   26404     0     0 0.41
...
# ps aux | grep [2]1534
root     21534 50.4  2.9 2671432 1948452 ?     Ssl  14:04  27:09 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/5685462f-c9bf-409c-9616-7d793847792d/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fconfig%2Farakoon_cacc.ini --log-sink console:

port 26404 on .33 & .34 are arakoon abm processes:

ovs       4194  0.7  0.0 751872 55388 ?        Ssl  Feb27 260:59 /usr/bin/arakoon --node rpiCljy6tjdFfykl -config arakoon://config/ovs/arakoon/be1-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini -autofix -start
root@ftcmp04:~# netstat -tanp | grep LISTEN | grep 26404
tcp        0      0 172.22.186.34:26404     0.0.0.0:*               LISTEN      3593/arakoon    
root@ftcmp04:~# ps aux | grep [3]593
ovs       3593  2.3  0.0 817316 48104 ?        Ssl  Feb27 823:56 /usr/bin/arakoon --node Id4wBqDPwYGt8Da9 -config arakoon://config/ovs/arakoon/be1-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini -autofix -start

.33 is arakoon master