openvstorage/alba

Noticed slow response on nightly build environment

Closed this issue · 2 comments

End: 10.100.190.31 -> 33 nightly build environment
System is slow to respond
Alba 1.3.1
Ubuntu 14.04.5 LTS

Possible scaling issue:

Current configuration of 8 proxies / voldrv leads to:

1 proxy has 10 connections to each asd: 1 * 10 * 3 = 30
8 proxies / voldrv = 8 * 30 = 240
3 node setup: 3 * 240 ? - to be confirmed

240 is very close to:

/proc/sys/net/netfilter/nf_conntrack_expect_max
256

As a next step we will/might use 4 voldrv / vpool -> 4 * 240

To be investigated how we can properly size this to avoid a DoS on our own systems

dmesg shows:

[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet
[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet
[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet
[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet
[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet
[Fri Dec 23 09:10:25 2016] nf_conntrack: table full, dropping packet

Increasing nr of connections only leads to systems consuming more connections, note limit is also memory dependent (it's a max but limited by memory available too: credits @dejonghb )

on this system arakoon-abm was consuming a lot of connections:
time_wait:

root@e190-node2:~# netstat -an | grep TIME_WAIT | grep 26404 | wc -l
27092

syn_sent:

root@e190-node2:~# netstat -an | grep SYN_SENT | grep 26404 | wc -l
102

All arakoon process respond intermittendly with a master not found which is expected

arakoon log shows fast creation of connections within one second interval:

2016-12-23 09:12:17 398341 +0100 - e190-node2 - 37345/0 - arakoon - 26324773 - info - exiting session (2) connection=10.100.190.32:client_service_8774381: End_of_file; backtrace:; Raised at file "src/core/lwt_sequence.ml", line 95, characters 10-15; Called from file "src/core/lwt_condition.ml", line 54, characters 21-47
2016-12-23 09:12:17 398361 +0100 - e190-node2 - 37345/0 - arakoon - 26324774 - info - 10.100.190.32:client_service_8774381: closing
2016-12-23 09:12:17 398486 +0100 - e190-node2 - 37345/0 - arakoon - 26324775 - info - exiting session (1) connection=10.100.190.32:client_service_8774380: End_of_file; backtrace:; Raised at file "map.ml", line 122, characters 16-25; Called from file "src/core/lwt.ml", line 161, characters 4-40
2016-12-23 09:12:17 398520 +0100 - e190-node2 - 37345/0 - arakoon - 26324776 - info - 10.100.190.32:client_service_8774380: closing
2016-12-23 09:12:17 437482 +0100 - e190-node2 - 37345/0 - arakoon - 26324777 - info - 10.100.190.32:client_service:session=0 connection=10.100.190.32:client_service_8774382 socket_address=ADDR_INET 10.100.190.31,51032
2016-12-23 09:12:17 437573 +0100 - e190-node2 - 37345/0 - arakoon - 26324778 - info - 10.100.190.32:client_service:session=1 connection=10.100.190.32:client_service_8774383 socket_address=ADDR_INET 10.100.190.31,51031
2016-12-23 09:12:17 437597 +0100 - e190-node2 - 37345/0 - arakoon - 26324779 - info - 10.100.190.32:client_service:session=2 connection=10.100.190.32:client_service_8774384 socket_address=ADDR_INET 10.100.190.31,51033
2016-12-23 09:12:17 437658 +0100 - e190-node2 - 37345/0 - arakoon - 26324780 - info - 10.100.190.32:client_service:session=3 connection=10.100.190.32:client_service_8774385 socket_address=ADDR_INET 10.100.190.31,51035
2016-12-23 09:12:17 437736 +0100 - e190-node2 - 37345/0 - arakoon - 26324781 - info - 10.100.190.32:client_service:session=4 connection=10.100.190.32:client_service_8774386 socket_address=ADDR_INET 10.100.190.31,51037
2016-12-23 09:12:17 437753 +0100 - e190-node2 - 37345/0 - arakoon - 26324782 - info - 10.100.190.32:client_service:session=5 connection=10.100.190.32:client_service_8774387 socket_address=ADDR_INET 10.100.190.31,51038
2016-12-23 09:12:17 438028 +0100 - e190-node2 - 37345/0 - arakoon - 26324783 - info - exiting session (2) connection=10.100.190.32:client_service_8774384: End_of_file; backtrace:; Called from file "src/unix/lwt_unix.ml", line 549, characters 17-28
2016-12-23 09:12:17 438218 +0100 - e190-node2 - 37345/0 - arakoon - 26324784 - info - 10.100.190.32:client_service_8774384: closing
2016-12-23 09:12:17 439216 +0100 - e190-node2 - 37345/0 - arakoon - 26324785 - info - 10.100.190.32:client_service:session=5 connection=10.100.190.32:client_service_8774388 socket_address=ADDR_INET 10.100.190.31,51043
2016-12-23 09:12:17 439280 +0100 - e190-node2 - 37345/0 - arakoon - 26324786 - info - 10.100.190.32:client_service:session=6 connection=10.100.190.32:client_service_8774389 socket_address=ADDR_INET 10.100.190.31,51044
2016-12-23 09:12:17 439364 +0100 - e190-node2 - 37345/0 - arakoon - 26324787 - info - 10.100.190.32:client_service:session=7 connection=10.100.190.32:client_service_8774390 socket_address=ADDR_INET 10.100.190.31,51045
2016-12-23 09:12:17 439406 +0100 - e190-node2 - 37345/0 - arakoon - 26324788 - info - 10.100.190.32:client_service:session=8 connection=10.100.190.32:client_service_8774391 socket_address=ADDR_INET 10.100.190.31,51046
2016-12-23 09:12:17 439486 +0100 - e190-node2 - 37345/0 - arakoon - 26324789 - info - 10.100.190.32:client_service:session=9 connection=10.100.190.32:client_service_8774392 socket_address=ADDR_INET 10.100.190.31,51047
2016-12-23 09:12:17 439535 +0100 - e190-node2 - 37345/0 - arakoon - 26324790 - info - 10.100.190.32:client_service:session=10 connection=10.100.190.32:client_service_8774393 socket_address=ADDR_INET 10.100.190.31,51048
2016-12-23 09:12:17 439707 +0100 - e190-node2 - 37345/0 - arakoon - 26324791 - info - exiting session (1) connection=10.100.190.32:client_service_8774383: End_of_file; backtrace:; Raised at file "map.ml", line 122, characters 16-25; Called from file "src/core/lwt.ml", line 161, characters 4-40
2016-12-23 09:12:17 439789 +0100 - e190-node2 - 37345/0 - arakoon - 26324792 - info - exiting session (6) connection=10.100.190.32:client_service_8774389: End_of_file; backtrace:; Raised at file "bytes.ml", line 219, characters 25-34; Called from file "src/pre_sexp.ml", line 86, characters 11-45

can we reproduce or can this be closed as we didn't see it again.

it's not an alba, but an infrastructure issue, and we don't use that many proxies anymore.