libp2p/test-plans

Why we're moving away from Testground

MarcoPolo opened this issue · 3 comments

The purpose of this issue is to document why we moved away from Testground for the libp2p interoperability tests and to highlight what worked for us. If you're considering using Testground or considering against Testground for your own project, this issue may help you make a decision. If you are a happy user of Testground, feel free to ignore this.

The libp2p team wants to test that each implementation and its supported versions can communicate with every other implementation and their supported versions over every supported combination of communication (since libp2p is built on top of smaller components that make up the communication). The simplest way to run this test is to have two nodes where one node dials another node and exchange a ping message. This is the simplest distributed systems test you can create.

Testground gives us a way to define this test and define the nodes that make up this test. Testground has built-in support for certain builders to build the code for the node implementation. Testground also gives us the environment to run these nodes in. It gives us some synchronization primitives. It gives us a metrics and grafana endpoint in the test. And it gives us some tools to shape the network (equivalent to tc and netem).

The Testground framework seems great on paper, but in practice the benefits were a little less clear. Some problems I ran into were:

  • Defining the build process was a bit tricky since there was a custom configuration language, that included Go templates.
    • There wasn't first class support for other languages besides Go, instead you would use a Dockerfile to define the build of a container. You could end up with multiple build patterns depending on the language.
  • The synchronization primitives came from testground-sdk, which had to be implemented for every language that wanted to use Testground.
  • We couldn't run the nodes directly without Testground. Lose access to all our debugging tools. This made debugging anything inside Testground a tricky endeavor.
  • An opaque abstraction on top of docker/k8s/local-binaries.
  • One more tool to learn and setup properly.
  • Slow for our test case. Each ping test would take 5.5s (in CI) and we need to run hundreds of these.

I initially tried to workaround some of these issues by introducing a simple DSL wrapper around Testground. This simplified our interface to Testground and standardized the build process. But it did nothing to make Testground run faster or provide a better experience to developers writing tests (how to debug the process?). It also hid the sharp edges of the Testground abstraction behind yet another abstraction. Which did less than nothing to make it easier for developers to understand what was going on.

This test is a very simple test. Had I not heard of Testground, but wanted to create a reproducible testing environment I would have reached for something like Compose (aka docker compose). After spending a good amount of time learning and understanding Testground and implementing these tests in Testground, I decided to experiment with a Compose setup. In half a day I had ported over everything I spent weeks on within Testground. These Compose tests ran in 1-2s in CI, and can run in parallel. On my machine (macOS) with a stock docker VM, each test took 0.5s.

The Compose setup was more transparent. You could run the binary locally with access to all your local debugging tools just as easily as running the binary within a container. For synchronization, I used a stock Redis container image and a normal Redis library. Unlike the Testground approach, I didn't have to first create a language-specific testground-sdk. I could pick up the most popular Redis client library and get going. Compose is a familiar tool to many developers, which gets rid of the "One more tool to learn" problem.

The switch to Compose is not without its drawbacks. However, these can be mitigated.

  • We lost access to the testground-sdk's ability to shape the network within the process.
    • We don't think this is important to change dynamically. We can shape the network on container start using well-documented tc commands.
  • We don't have out-of-the-box metrics/grafana endpoints
    • We can add these to the compose setup ourselves when we need them.
  • We can't easily scale this up to thousands of Nodes, like we may be able to in Testground.
    • We can use https://kompose.io/, An official Kubernetes project to convert compose to k8s configs, to deploy to k8s.
    • We can use docker swarm, to deploy the compose file.
    • ECS has support for Compose.

The Compose setup also lets us do things that we either couldn't do before or were hard to do:

  • We can define a network topology that mimics US/EU region clusters where nodes in US have low latency to each other but high latency to EU and vice-versa.
  • We can easily get a tcpdump/qlog data (Marten got this working within the hour).
  • We can implement new tests faster (js-libp2p and rust-libp2p ping tests implemented in < half a day. Counting Round trip for initial connection hacked up in a couple of hours)

TL;DR


Our test is the equivalent of driving down the street to get groceries. The Compose setup is not flashy, and anyone can use it. In that sense it's like driving a Toyota Corolla. Testground, in comparison, is a Boeing 747. It's much easier to get groceries in a Corolla than a 747.

My recommendations for other projects

Try to answer first if you need all or most of the features that Testground provides. Would it be easier/simpler/faster to pick these features up à la carte? Are many folks going to be working on these tests? Do you want outside folks to contribute?

At the end of the day there's a goal you're trying to achieve and that goal is probably not "use Testground". So pick the best tools for your goal and build it.

Thank you for this great summary @MarcoPolo! What do you think of publishing a blog article, taking this as a starting point?

Thank you for this great summary @MarcoPolo! What do you think of publishing a blog article, taking this as a starting point?

Should the title be "Don't go grocery shopping in a Boeing 747"?

This is linked to in the Readme https://github.com/libp2p/test-plans#history
Closing the issue since it doesn't prevent others from discovering it or commenting on it