newsapps/beeswithmachineguns

SSH traffic volume devastates HTTPS test scaling

Closed this issue · 3 comments

BWMG operates by opening an SSH session to each instance to collect verbose data from apachebench which is operating in verbose mode "-v 3", to collect fine-grained timing and response status data to be aggregated by BWMG itself.

This verbose output includes a large number of lines which are not needed by BWMG for statistics generation, especially when attacking HTTPS URLs (due to additional SSL/TLS debug sent to stderr and stdout). stderr is entirely discarded within bees.py anyway.

For an instance performing 2500 HTTP req/s, this is approximately 800KByte/s sent over SSH. For instances performing 400-600 HTTPS req/s this is approximately 1.2 megabytes/s.

In a test with 100 instances, this adds up: 80 megabytes/s of raw data to support 250kreq/s of load testing, or 600 megabytes/s (5 gigabits/s) to support 250kreq/s of HTTPS testing.

Unfortunately, this has a few negative side-effects:

  • Instances with 1 CPU face increased CPU contention and context switches to send this data
  • Data outbound from AWS is billable, and this bandwidth usage (in certain use cases) is actually greater than the data used for the actual benchmarking
  • For testing with a high number of instances, this bandwidth requirement becomes too large for the python client to sanely collect, on the order of hundreds of megabytes/s of SSH payload data.

Some kind of instance-side filtering of stdout and stderr to only the necessary data would alleviate these scaling limitations, reduce cost, and likely increase the requests/s that a given instance can perform when not maintaining this housekeeping (especially single-vCPU instances)

I will be opening a PR shortly with a proposed solution for this issue.

Results (before -> after for [protocol] (req/s)):
800kbyte/s -> 145 kbyte/s for HTTP (3650 req/s)
1200kbyte/s -> 25 kbyte/s for HTTPS (600req/s)

Notice that the requests per second are appreciably higher than before due to lower CPU contention and context switching on the single-vCPU instance used (m3.medium). HTTP performance to the same target 3ms from us-east-1 increased from 2500 HTTP req/s to 3650. HTTPS performance increased from 400-500 req/s up to 630 req/s.

This issue was much worse for HTTPS due to the SSL verbosity from apachebench. This change represents a 98% reduction in output carried over SSH.

PR #195 opened:

  • stderr is discarded on the instance side inside of the benchmark_command shell command. stderr is never consumed by BWMG so this data should be discarded
  • fgrep was used due to simple substring matching being sufficient
  • A list of substrings required for statistics assembly is added, joined by newlines, and used with -F. This is to avoid needing to drop a file with list of patterns onto the AMI, at the "cost" of a larger benchmark_command. If this leads to undesirable output on the client (as benchmark_command is printed to the user), the command that is printed could be split on | and truncated (or abbreciated at this point with ellipses .... to at least indicate to the user that more command is being hidden
  • Doing this filtering via fgrep is not as ideal compared to if apachebench provided output tailored to BWMG's usage, but it offers one benefit that multi-v/CPU instances will likely schedule fgrep (and ssh) on another core, allowing ab to operate on one CPU less-interrupted. This is hypothetical/unproven.
  • This only minimally helps with context switches on the instance side when running on single-vCPU instances, as fgrep will contend for the same CPU. However, the work being performed is far less than what SSH would have been doing resource-wise. As mentioned in the previous comment, there is a definite benefit to apachebench performance with this patch.

Closing - resolved by PR #195 merge.