SSH traffic volume devastates HTTPS test scaling

Question

SSH traffic volume devastates HTTPS test scaling

Closed this issue 7 years ago · 3 comments

BWMG operates by opening an SSH session to each instance to collect verbose data from apachebench which is operating in verbose mode "-v 3", to collect fine-grained timing and response status data to be aggregated by BWMG itself.

This verbose output includes a large number of lines which are not needed by BWMG for statistics generation, especially when attacking HTTPS URLs (due to additional SSL/TLS debug sent to stderr and stdout). stderr is entirely discarded within bees.py anyway.

For an instance performing 2500 HTTP req/s, this is approximately 800KByte/s sent over SSH. For instances performing 400-600 HTTPS req/s this is approximately 1.2 megabytes/s.

In a test with 100 instances, this adds up: 80 megabytes/s of raw data to support 250kreq/s of load testing, or 600 megabytes/s (5 gigabits/s) to support 250kreq/s of HTTPS testing.

Unfortunately, this has a few negative side-effects:

Instances with 1 CPU face increased CPU contention and context switches to send this data
Data outbound from AWS is billable, and this bandwidth usage (in certain use cases) is actually greater than the data used for the actual benchmarking
For testing with a high number of instances, this bandwidth requirement becomes too large for the python client to sanely collect, on the order of hundreds of megabytes/s of SSH payload data.

Some kind of instance-side filtering of stdout and stderr to only the necessary data would alleviate these scaling limitations, reduce cost, and likely increase the requests/s that a given instance can perform when not maintaining this housekeeping (especially single-vCPU instances)

Answer 1 · 2017-11-16T14:59:41.000Z

I will be opening a PR shortly with a proposed solution for this issue.

Results (before -> after for [protocol] (req/s)):
800kbyte/s -> 145 kbyte/s for HTTP (3650 req/s)
1200kbyte/s -> 25 kbyte/s for HTTPS (600req/s)

Notice that the requests per second are appreciably higher than before due to lower CPU contention and context switching on the single-vCPU instance used (m3.medium). HTTP performance to the same target 3ms from us-east-1 increased from 2500 HTTP req/s to 3650. HTTPS performance increased from 400-500 req/s up to 630 req/s.

This issue was much worse for HTTPS due to the SSL verbosity from apachebench. This change represents a 98% reduction in output carried over SSH.

Answer 2 · 2017-11-16T15:14:54.000Z

PR #195 opened:

stderr is discarded on the instance side inside of the benchmark_command shell command. stderr is never consumed by BWMG so this data should be discarded
fgrep was used due to simple substring matching being sufficient
A list of substrings required for statistics assembly is added, joined by newlines, and used with -F. This is to avoid needing to drop a file with list of patterns onto the AMI, at the "cost" of a larger benchmark_command. If this leads to undesirable output on the client (as benchmark_command is printed to the user), the command that is printed could be split on | and truncated (or abbreciated at this point with ellipses .... to at least indicate to the user that more command is being hidden
Doing this filtering via fgrep is not as ideal compared to if apachebench provided output tailored to BWMG's usage, but it offers one benefit that multi-v/CPU instances will likely schedule fgrep (and ssh) on another core, allowing ab to operate on one CPU less-interrupted. This is hypothetical/unproven.
This only minimally helps with context switches on the instance side when running on single-vCPU instances, as fgrep will contend for the same CPU. However, the work being performed is far less than what SSH would have been doing resource-wise. As mentioned in the previous comment, there is a definite benefit to apachebench performance with this patch.

Answer 3 · 2017-11-16T15:58:19.000Z

Closing - resolved by PR #195 merge.