/gossipsub-hardening

testground plans for evaluating gossipsub under attack scenarios

Primary LanguagePython

gossipsub-hardening

This repo contains testground test plans designed to evaluate the performance of gossipsub under various attack scenarios.

Installation & Setup

We're using python to generate testground composition files, and we shell out to a few external commands, so there's some environment setup to do.

If you want to run tests on the Protocol Labs Jupyterhub instance, see README-shared-environment.md, then read Running Tests below.

Requirements

Hardware

While no special hardware is needed to run the tests, running with a lot of test instances requires considerable CPU and RAM, and will likely exceed the capabilities of a single machine.

A large workstation with many CPU cores can reasonably run a few hundred instances using the testground local:docker runner, although the exact limit will require some trial and error. In early testing we were able to run 500 containers on a 56 core Xeon W-3175X with 124 GiB of RAM, although it's possible we could run more now that we've optimized things a bit.

It's useful to run with the local:docker or local:exec runners during test development, so long as you use fairly small instance counts (~25 or so works fine on a 2018 13" MacBook Pro with 16 GB RAM).

When using the local:docker runner, it's a good idea to periodically garbage collect the docker images created by testground using docker system prune to reclaim disk space.

To run larger tests like the ones in the saved configurations, you'll need a kubernetes cluster.

If you have access to our shared Jupyterhub instance, your environment should already be set up to create and manage a cluster in our AWS project.

To create your own cluster, follow the directions in the testground/infra repo to provision a cluster on AWS, and configure your tests to use the cluster:k8s test runner.

The testground daemon process can be running on your local machine, as long as it has access to the k8s cluster. The machine running the daemon must have Docker installed, and the user account must have permission to use docker and have the correct AWS credentials to connect to the cluster. When running tests on k8s, the machine running the testground daemon doesn't need a ton of resources, but ideally it should have a fast internet connection to push the Docker images to the cluster.

Running the analysis notebooks benefits from multiple cores and consumes quite a bit of RAM, especially on the first run when it's converting data to pandas format. It's best to have at least 8 GB of RAM free when running the analysis notebook for the first time.

Also note that closing the browser tab containing a running Jupyter notebook does not stop the python kernel and reclaim the memory used. It's best to select Close and Halt from the Jupyter File menu when you're done with the analysis notebook instead of just closing the tab.

Testground

You'll need to have the testground binary built and accessible on your $PATH. As of the time of writing, we're running from master, however releases after v0.5.0 should be compatible.

After running testground list for the first time, you should have a ~/testground directory. You can change this to another location by setting the TESTGROUND_HOME environment variable.

Cloning this repo

The testground client will look for test plans in $TESTGROUND_HOME/plans, so this repo should be cloned or symlinked into there:

cd ~/testground/plans # if this dir doesn't exist, run 'testground list' once first to create it
git clone git@github.com:protocol/gossipsub-hardening.git

Python

We need python 3.7 or later, ideally in a virtual environment. If you have python3 installed, you can create a virtual environment named venv in this repo and it will be ignored by git:

python3 -m venv venv

After creating the virtual environment, you need to "activate" it for each shell session:

# bash / zsh:
source ./venv/bin/activate

# fish:
source ./venv/bin/activate.fish

You'll also need to install the python packages used by the scripts:

pip install -r scripts/requirements.txt

External binaries

The run scripts rely on a few commands being present on the PATH:

  • the testground binary
  • go

Running Tests

Running using the Runner Jupyter notebook

With the python virtualenv active, run

jupyter notebook

This will start a Jupyter notebook server and open a browser to the Jupyter file navigator. In the Jupyter UI, navigate to the scripts dir and open Runner.ipynb.

This will open the runner notebook, which lets you configure the test parameters using a configuration UI.

You'll need to run all the cells to prepare the notebook UI using Cell menu > Run All. You can reset the notebook state using the Kernel Menu > Restart and Run All command.

The cell at the bottom of the notebook has a "Run Test" button that will convert the configured parameters to a composition file and start running the test. It will shell out to the testground client binary, so if you get an error about a missing executable, make sure testground is on your PATH and restart the Jupyter server.

At the end of a successful test, there will be a new output/pubsub-test-$timestamp directory (relative to the scripts dir) containing the composition file, the full test-output.tgz file collected from testground, and an analysis directory.

The analysis directory has relevant files that were extracted from the test-output.tgz archive, along with a new Jupyter notebook, Analysis.ipynb. See below for more details about the analysis notebook.

If the test fails (testground returns a non-zero exit code), the runner script will move the pubsub-test-$timestamp dir to ./output/failed.

The "Test Execution" section of the config UI will let you override the output path, for example if you want to give your test a meaningful name.

Targeting a specific version of go-libp2p-pubsub

The default configuration is to test against the current master branch of go-libp2p-pubsub, but you can change that in the Pubsub panel of the configuration UI. You can enter the name of a branch or tag, or the full SHA-1 hash of a specific commit.

Important: if you target a version before the Gossipsub v1.1 PR was merged, you must uncheck the "target hardening branch API" checkbox to avoid build failures due to missing methods.

Saved test configurations

You can save configuration snapshots to JSON files and load them again using the buttons at the bottom of the configuration panel. The snapshots contain the state of all the configuration widgets, so can only be used with the Runner notebook, not the command line run.py script.

There are several saved configs in scripts/configs that we've been using to evaluate different scenarios. The "baseline" config is scripts/configs/1k.json, which sets up a test with 1000 honest nodes and no attackers.

The saved configs are all setup to use the cluster:k8s runner, and they expect to find the testground daemon on a non-standard port (8080 instead of 8042). If you're using our shared jupyterhub server, this should all Just Work, but if you're running elsewhere you may need to change those parameters to suit your environment.

Running using the cli scripts

Inside the scripts directory, the run.py script will generate a composition and run it by shelling out to testground. If you just want it to generate the composition, you can skip the test run by passing the --dry-run flag.

You can get the full usage by running ./run.py --help.

To run a test with baseline parameters (as defined in scripts/templates/baseline/params/_base.toml), run:

./run.py

By default, this will create a directory called ./output/pubsub-test-$timestamp, which will have a composition.toml file inside, as well as a template-params.toml that contains the params used to generate the composition.

You can control the output location with the -o and --name flags, for example:

./run.py -o /tmp --name 'foo'
# creates directory at /tmp/pubsub-test-$timestamp-foo

Note that the params defined in scripts/templates/baseline/params/_base.toml have very low instance counts and are likely useless for real-world evaluation of gossipsub.

You can override individual template parameters using the -D flag, for example, ./run.py -D T_RUN=5m. There's no exhaustive list of template parameters, so check the template at scripts/templates/baseline/template.toml.j2 to see what's defined.

Alternatively, you can create a new toml file containing the parameters you want to set, and it will override any parameters defined in scripts/templates/baseline/params/_base.toml

By default, the run.py script will extract the test data from the collected test output archive and copy the analysis notebook to the analysis subdirectory of the test output dir. If you want to skip this step, you can pass the --skip-analysis flag.

Analyzing Test Outputs

After running a test, there should be a directory full of test outputs, with an analysis dir containing an Analysis.ipynb Jupyter notebook. If you're not already running the Jupyter server, start it with jupyter notebook, and use the Jupyter UI to navigate to the analysis notebook and open it.

Running all the cells in the analysis notebook will convert the extracted test data to pandas DataFrames. This conversion takes a minute or two depending on the size of the test and your hardware, but the results are cached to disk, so future runs should be pretty fast.

Once everything is loaded, you'll see some charts and tables, and there will be a new figures directory inside the analysis dir containing the charts in a few image formats. There's also a figures.zip with the same contents for easier downloading / storage.

Running the analysis notebook from the command line

If you just want to generate the charts and don't care about interacting with the notebook, you can execute the analysis notebook using a cli script.

Change to the scripts directory, then run

./analyze.py run_notebook ./output/pubsub-test-$timestamp

This will copy the latest analysis notebook template into the analysis directory and execute the notebook, which will generate the chart images.

This command is useful if you've made changes to the analysis notebook template and want to re-run it against a bunch of existing test outputs. In that case, you can pass multiple paths to the run_notebook subcommand:

./analyze.py run_notebook ./output/pubsub-test-*
# will run the latest notebook against everything in `./output

Storing & Fetching Test Outputs in S3

The scripts/sync_outputs.py script is a wrapper around the rclone command that helps backup test outputs to an s3 bucket, or fetch a previously stored output directory to the local filesystem.

The AWS credentials are pulled from the environment - see the AWS cli docs if you haven't already configured the aws cli to use your credentials. The configured user must have permission to access the bucket used to sync.

rclone must be installed and on the $PATH to use the sync_outputs.py script.

By default, it uses the S3 bucket gossipsub-test-outputs in eu-central-1, but you can control this with the --bucket and --region flags.

To backup all the test outputs in ./output:

./sync_outputs.py store-all ./output

It will ignore the failed subdirectory automatically, but if you want to ignore more, you can pass in a flag:

./sync_outputs.py store-all ./output --ignore some-dir-you-dont-want-to-store

Alternatively, you can selectively store one or more test outputs with the store subcommand:

./sync_outputs.py store ./output/pubsub-test-20200409-152658 ./output/pubsub-test-20200409-152983 # etc...

You can also fetch test outputs from S3 to the local filesystem. To fetch everything from the bucket into ./output:

./sync_outputs.py fetch-all ./output

Or, to fetch one or more tests from the bucket instead of everything:

./sync_outputs.py fetch --dest=./output pubsub-test-20200409-152658

You can list all the top-level directories in the S3 bucket (so you know what to fetch) using the list command:

./sync_outputs.py list

Code Overview

The test code all lives in the test directory.

main.go is the actual entry point, but it just calls into the "real" main function, RunSimulation, which is defined in run.go.

params.go contains the parameter parsing code. The parseParams function will return a testParams struct will all test parameters. Note that some params only apply to specific types of node, for example the SybilParams are only used by attacker nodes, while OverlayParams is only used by honest nodes. To simplify the parsing, they're all included in the testParams struct, but each node type will only use what they need.

The set of params provided to each test instance depends on which composition group they're in. The composition template we're using defines three groups: publishers, lurkers, and attackers. The attackers have the sybil param set to true, while the honest groups have it set to the default of false. The lurkers and publishers have identical params with the exception of the boolean publisher param, which controls whether they will publish messages or just consume them.

After parsing the params, RunSimulation will prepare the libp2p Hosts, do some network setup and then call either runHonest or runSybil for each Host, depending on the node type specified in the params. Note that there may be multiple Hosts for one test instance, depending on the configuration. We generally run multiple attacker nodes in a container but run each honest node in its own container. This is because honest nodes require more resources, and because the artificial network latency does not apply between peers in the same container. Since the sybils never connect to each other, we don't care about simulating latency between them.

discovery.go contains a SyncDiscovery component that uses the testground sync service to broadcast information about the test peers (e.g. addreses, whether they're honest, etc) with every other peer. It uses this information to connect nodes to each other in various topologies.

The honest node implementation is in honest.go, and there are also honest_hardened.go and honest_vanilla.go files that allow us to target the new gossipsub v1.1 API or the old "vanilla" API by setting a build tag. If the vanilla API is used, the test will not produce any peer score information, since that was added in v1.1, however you will see the effect of the attacks in the latency distribution.

The sybil nodes are implemented in badboy.go, which provides a minimal gossipsub implementation with some nasty behaviors.

The tracer.go file implements the pubsub.EventTracer interface to capture pubsub events and produce test metrics. Because the full tracer output is quite large (several GB for a few minutes of test execution with lots of nodes), we aggregate the trace events at runtime and spit out a json file with aggregated metrics at the end of the test. We also capture a filtered subset of the original traces, containing only Publish, Deliver, Graft, and Prune events. At the end of the test, we run tracestat on the filtered traces to calculate the latency distribution and get a summary of publish and deliver counts.