vSwarm - Serverless Benchmarking Suite

Welcome! vSwarm is a collection of ready-to-run serverless benchmarks, each typically consisting of a number of interconnected serverless functions, and with a general focus on realistic data-intensive workloads.

This suite is part of the vHive Ecosystem. Its a turnkey and fully tested solution meant to used in conjunction with vHive, and is compatible with all technologies that it supports, namely, containers, Firecracker and gVisor microVMs. The majority of benchmarks support distributed tracing with Zipkin which traces both the infra components and the user functions.

In addition to the multi-function benchmarks, the vSwarm suite contains a set of standalone functions, which support both x86 and arm64 architectures. Most of the standalone functions are compatible with vSwarm-u, which allows to run them in the gem5 cycle-accurate full-system CPU simulator and study microarchitectural implications of serverless computing. the state-of-the-art research platform for system-and microarchitecture.
The standalone functions can therefore be used as microbenchmarks to first pin-point microarchitectural bottlenecks in execution of serverless workloads using Top-Down analysis (tool) on real hardware and then further explore and optimize these bottlenecks using the gem5 cycle-accurate simulator.

Directory Structure

benchmarks contains all of the available benchmark source code and manifests.
utils contains utilities for use within serverless functions, e.g. the tracing module.
tools is for command-line tools and services useful outside of serverless functions, such as deployment or invokation.
runner is for setting up self-hosted GitHub Actions runners.
docs contains additional documentation on a number of relevant topics.

Summary of Benchmarks

2 microbenchmarks for benchmarking chained functions performance, data transfer performance in various patterns (pipeline, scatter, gather), and using different communication medium (AWS S3 and inline transfers)
8 real-world benchmarks
- MapReduce: Corral (golang), and an aws-reference python implementation of Aggregation Query from the representative AMPLab Big Data Benchmark 1node dataset.
- Real-time video analytics (Python and Golang): recognizes objects in a video fragment
- ML models training: stacking ensemble training and iterative hyperparameter tuning
- ExCamera video decoding (gg): decoding of a video in parallel
- distributed compilation (gg): compiles LLVM in parallel
- fibonacci (gg): classic recursive implementation to find nth number in the sequence by calculating n-1 and n-2 in parallel
25 standalone functions
- AES, Auth, Fibonacci: Same functionality implemented in the three different runtimes: Python, NodeJS, Golang.
- Online shop: 9 functions implemented in various runtimes, ported from Googles Online Boutique
- Hotel reservation: 7 microservices from DeathStarBenchs Hotel Reservation Application ported as standalone serverless microbenchmarks.

Refer to this document for more detail on the differences and supported features of each benchmark.

Running Benchmarks

Details on each specific benchmark can be found in their relative subfolders. Every benchmark can be run on a knative cluster, and most can also be run locally with docker-compose. Please see the running benchmarks document for detailed instructions on how to run a benchmark locally or on a cluster.

We have a detailed outline on the benchmarking methodology used, which you can find here.

Contributing a Benchmark

We openly welcome any contributions, so please get in touch if you're interested!

Bringing up a benchmark typically consists of dockerizing the benchmark functions to deploy and test them with docker-compose, then integrating the functions with knative, and including the benchmark in the CI/CD pipeline. Please refer to our documentation on bringing up new benchmarks for more guidance.

We also have some basic requirements for contributions the the repository, which are described in detail in our Contributing to vHive document.

License and copyright

vSwarm is free. We publish the code under the terms of the MIT License that allows distribution, modification, and commercial use. This software, however, comes without any warranty or liability.

The software is maintained by the vHive Ecosystem, EASE lab the University of Edinburgh, Stanford Systems and Networking Research.

Maintainers

Invoker, timeseriesdb, runners - Dmitrii Ustiugov: GitHub, twitter, web page
ML benchmarks and utils (tracing and storage modules) - Michal Baczun GitHub
ML benchmarks - Rustem Feyzkhanov GitHub
Video Analytics and Map-Reduce benchmarks - Shyam Jesalpura GitHub
GG benchmarks - Francisco Romero GitHub and Clemente Farias GitHub

SJTU-Serverless/vSwarm