/wu-wei-handbook

Handbook on how to use the Wu Wei benchmarking toolkit

Apache License 2.0Apache-2.0

Wu-Wei handbook

The Wu-Wei benchmarking toolkit is (1) a set of conventions for organizing artifacts related to benchmarking on the file system and describing them using JSON files and (2) a set of commandline utilities for performing common benchmarking tasks with artifacts. The endgoal is to make benchmarking results drastically easier to obtain, replicate, compare, analyze, publish, etc. and to eventually obtain benchmarking commons similar to other package repositories (ex: Linux distributions or the npm ecosystem).

Overview

The conventions and tools are built around a configuration data structure and a benchmarking cycle decomposed in different phases. We cover each in turn.

Configuration

A configuration is a combination of the description of one of each of these category of artifacts: a benchmark, a specific implementation of that benchmark in a given programming language, a compiler, an environment, a platform, and experiment parameters. The description of each artifact is written in the JSON format in the root directory of that artifact with a name that reflects the category of artifact. For example, a compiler has a 'compiler.json' file that describe the properties of that compiler and how to use it. The configuration is itself a JSON object with the artifact description listed under their respective category name. Each artifact category that is part of a configuration are defined and illustrated in the following figure:

Image

Each of the artifact descriptions may expect parameters that come from an artifact of a different category. For example, the compiler expects to find the source files to be listed under specific properties of the implementation description. The concrete values for those properties are resolved before executing a given phase of the benchmarking cycle. For example, a compiler expects the entry point of a benchmark implementation to be under the "runner-source-file" property on the description of the implementation to pass as one of the options of the compiler. When the configuration for that example is initially created it would contain the following:

{
  ...
  "implementation": {
    ...
    "runner-source-file": { "file": "./runner.c" }
    ...
  },
  "compiler": {
    ...
    "commands": [
      {
        "executable-name": "gcc",
        "options": [
          ...
          { "config": "/implementation/runner-source-file" }
          ...
        ]
      }
    ]
    ...
  }
  ...
}

Before the execution of a given phase, the references to other descriptions are resolved to obtain concrete values. The resolution of those references is called the configuration expansion, because it shares similarities with the macro expansion of some functional languages such as Scheme. After the expansion, every reference (and other custom datatypes such as the "file" objectt) are resolved to their concrete value as strings or numbers. Additionally, for simplicity and uniformity, every file path becomes an absolute path. Therefore after expansion the previous example would become:

{
  ...
  "implementation": {
    ...
    "runner-source-file": "/path/to/runner.c"
    ...
  },
  "compiler": {
    ...
    "commands": [
      {
        "executable-name": "gcc",
        "options": [
          ...
          "/path/to/runner.c",
          ...
        ]
      }
    ]
    ...
  }
  ...
}

After successful expansion, a configuration is ready to be used to perform a phase of the benchmarking cycle. The different phases are illustrated in the following figure, and explained hereafter:

Benchmarking Cycle

Image

Initially, artifacts are gathered into a repository from various sources (git repositories, file archives, directories on the file system, etc.). The different benchmark implementations, possibly in different programming languages, are then processed using compiler(s) that may translate them to another or the same language to obtain a build, an executable version of the benchmark and the configuration of artifacts that was used to create it. The build is then executed on an execution environment (natively on the operating system, in a virtual machine for a programming language, etc.) to obtain a run, the result of the execution and the associated metrics (time to completion, memory or energy used, etc.). In addition to the metrics, the configuration of artifacts are saved with the results for later reference and traceability. Finally, multiple runs may be aggregated into a single human-readable report (ascii, html, etc.) that summarizes the execution results, the metrics gathered, and comparisons between the different configurations used. Those reports may be used for academic publications, for internal reports, or self-publication online for collaboration with other people.

The process is linear for a simple replication study but in practice there are cycles between the different phases based on the feedback obtained from the reports. That may involve the gathering of more artifacts for comparison, the modification of existing implementations, compilers, or environments while keeping track of the previous versions. The cycle are repeated until the whole development/analysis process converges to an interesting result. The modified or newer versions of artifacts may then be shared online for others to be used directly, to replicate experiments, or to be extend/improved upon.

Conceptually, there are three major times for a benchmark implementation:

  • Design Time: when a modification may be made by a human or externally to the wu-wei cycle. It corresponds to the installation part of the cycle or happens right before the build phase.
  • Static Time: when a moditifcation may be done automatically using only information from the source code. It corresponds to the build phase of the cycle.
  • Run Time: when a modification may be done automatically using the execution information. It corresponds to the run phase of the cycle.

During the build phase, an executable version of a benchmark, or build, is created from a combination of a benchmark, a specific implementation of that benchmark in a given programming language, a compiler, and experiment parameters. The combination of these artifacts and parameters is called a configuration. Alongside the executable version of the benchmark, a 'build.json' file is created that describes the configuration that was used to generate the build. A directory unique to the configuration is automatically created under the 'builds' directory of the repository by taking the hash of the string value of the representation.

During the run phase, the execution output of a build running on a given execution environment is produced with metrics on the execution. The metrics and configuration used to produce them are stored in a 'run.json' file in a directory named with the time at which the run happened, under the 'runs' directory of the repository. Inside that directory, files that may have been created as side-effects are stored under a directory 'run-hash/iteration-number' where run-hash is the hash of the run configuration, and iteration-number is the index of the iteration that was run. An execution environment might run the same or multiple versions of an implementation before producing a result, therefore it might interleave multiple executions and compilation steps before converging to a final result. Intermediate files may be saved in the corresponding run directory. In the simplest and most common case however, it executes the implementation only once to gather metrics, such as execution time, and discards any output.

Conventions and the commandline interface of the tools are introduced in guides organized around high-level tasks related to benchmarking. The rest of this document provides quick references to the Wu-Wei concepts and conventions.

Installing the tools

(Recommended) Install nvm and activate a recent (>=6.3.1) version of Node.js

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.31.4/install.sh | bash
nvm install 6.3.1
nvm use 6.3.1

Install Wu-Wei

git clone https://github.com/Sable/wu-wei-benchmarking-toolkit.git
cd wu-wei-benchmarking-toolkit
npm install
npm link

Specific guides

We suggest starting with the following introduction to obtain an overview of the tools and conventions used in Wu-Wei:

After the introduction, the following guides can be read in any order:

Using Wu-Wei

The development of the Wu-Wei tools and the writing of this handbook is a labor of love and grew out of the frustration in the difficulty of reusing existing benchmarks and tools. The work is available free of charge in order to ensure timely dissemination and the widest impact. If any of this work has been useful to you for any of the following tasks, we kindly ask you to contribute in the following ways:

Used for an academic publication

TODO

Used for an online publication (blog, social media posts, etc.)

TODO

Used for performance evaluations in a for-profit company

TODO

Used for projects with public visibility

TODO

Quick reference for common tasks

Task Command(s)
Clean previous builds wu build --clean
Clean previous runs wu run --clean
Initialize repository in an existing directory wu init
List installed artifacts (initalize platform information if needed) wu list
Show platform information wu platform
Save platform information for future runs non-interactively wu platform --save --short-name name
Show where a build is saved wu build -v
Show the execution output of runs wu run -v
Show the latest run outputs ls repository-root/runs/latest

Common mistakes and fixes

Symptom Cause Fix
SyntaxError: Block-scoped declarations (let, const, function, class) not yet supported outside strict mode You are using a version of node that is too old to support newer JavaScript constructs Use nvm to activate a newer version (>= 6.3.1) by doing 'nvm use 6.3.1'
TODO Invalid JSON format for file Use an online JSON format checker
TODO Incorrect description file Read the Wu-Wei config-schema.json file to find the formatting mistake, search for the schema with the type attribute that corresponds to the artifact type
TODO Invalid output format for implementation runner Use 'wu run -v' to show the execution output and fix the output using the guidelines for new implementations

Terminology

Terminology Definition
artifact Elementary component with a file representation and associated JSON meta-information
repository Collection of artifacts on the file system
action Operation that can be performed on artifact(s)
Artifacts Definition
benchmark Abstract algorithm that performs a useful numerical task
build Runnable implementation that is ready to be executed on an environment and the configuration that was used to create it
compiler Program that processes a benchmark implementation to produce a new implementation (in the same or a different language)
configuration Combination of artifacts and their associated parameters necessary for performing an action
environment Additional virtualization layer executing on top of a platform (or another environment) a benchmark might execute on
experiment Combination of a configuration and experimental parameters that determine what, where, and how an implementation is to be executed
implementation Realization of a benchmark in a specific language (ex: C, assembly) and packaged in a particular format (ex: text file, binary, webpage) which might be directly runnable or not
platform Hardware and (native) OS combination for the machine to run benchmarks on
report Collection of figure(s) and the configuration that was used to produce them
run Execution report (ex: timing results, memory usage, etc.) and the experiment that produced it
Actions Definition
building Creation, from a configuration, of an implementation that is executable on an environment
installing Retrieval, and initialization of an artifact in the repository
reporting Aggregation of multiple compatible runs, selection of significant results, and production of human-readable figure(s) from those results
running Execution of an implementation on a stack of environment(s), and production of a run from the monitoring of its execution

Repository directory structure

Directory Content
.wu Wu-Wei configuration and temporary files used during operations
benchmarks Benchmark artifacts under short-name/benchmark.json and their various implementations
builds Each generated build has at least configuration-hash/build.json with the configuration that generated the build and configuration-hash/executable with the executable version of an implementation
compilers Compiler artifacts under short-name/compiler.json and associated files
environments Execution environment artifacts under short-name/environment.json, their associated short-name/run script and other associated files
experiments Experiments under short-name/experiment.json
platforms Known platform configurations under short-name/platform.json
runs Each generated run has at least datetime/run.json. Outputs from runs are stored under datetime/configuration-hash/iteration-number

Artifact Description JSON Formats

Basic data types

Canonical names for languages

Benchmark

Compiler

Implementation

Environment

Experiment

Platform

Commandline interface reference

wu build

usage:

Purpose: create a build from a configuration

wu init

usage:

Purpose: create an empty benchmarking repository

wu install

usage:

Purpose: add and initialize an artifact in the repository

wu platform

usage:

Purpose: list the current platform specification

wu list

usage:

Purpose: find, verify, and list available artifact(s)

wu run

usage:

Purpose: execute build(s) on compatible environment(s) and produce a run

wu report

usage:

Purpose: aggregate run(s), select significant result(s), and produce a report