/ddshow

Visualization for Timely Dataflow and Differential Dataflow programs

Primary LanguageRustApache License 2.0Apache-2.0

DDShow

Visualization for Timely Dataflow and Differential Dataflow programs

Getting started with ddshow

First, install ddshow via cargo. As of now ddshow is not published to crates.io, but it will be at a future date. Until then, the recommended way to install ddshow is by using the --git option with cargo install

cargo install --git https://github.com/Kixiron/ddshow

Next you need to set the TIMELY_WORKER_LOG_ADDR environmental variable for your target program. This should be set to the same address that ddshow is pointed (127.0.0.1:51317 by default) to so that they can communicate over TCP.

# Bash
set TIMELY_WORKER_LOG_ADDR=127.0.0.1:51317
# Powershell
$env:TIMELY_WORKER_LOG_ADDR = "127.0.0.1:51317"
:: CMD
set TIMELY_WORKER_LOG_ADDR=127.0.0.1:51317

After setting the environmental variable you can now run ddshow. The --connections argument should be set to the number of timely workers that the target computation has spun up, defaulting to 1 if it's not given and the --address argument for setting the address ddshow should connect to. Note that --address should be the same as whatever you set the TIMELY_WORKER_LOG_ADDR variable to, otherwise ddshow won't be able to connect.

ddshow --connections 1 --address 127.0.0.1:51317

This will create the dataflow-graph/ directory which contains everything that ddshow's UI needs to operate offline. Opening dataflow-graph/graph.html in a browser will allow viewing the graphed dataflow

The full list of arguments ddshow supports and their options can be retrieved by running

ddshow --help

For basic usage

Showcase

Debugging tips:

If the output is empty when it shouldn't be, make sure you aren't overwriting the default loggers set by Timely and DDflow by using Worker::log_register() with a Logger implementation that doesn't forward logging events.

Another common problem is a mismatch of timely versions. Because of how abomonation (used for sending events) works, the structure of events isn't consistent across timely versions (and even different rustc invocations) which can cause errors, incompatibilities and silent failures. The only known solution for this is to make sure ddshow and the target program use the same versions of timely and ddflow or to use the ddshow-sink crate which has stable and FFI-safe versions of the timely logging types.

When looking for Differential Dataflow insights, make sure you have this (or an equivalent) snippet somewhere within your code in order to forward Differential Dataflow logs

// `worker` should be an `&mut Worker<A>`, generally acquired from the inner
// closure of `timely::execute()`

if let Ok(addr) = std::env::var("DIFFERENTIAL_LOG_ADDR") {
    if !addr.is_empty() {
        if let Ok(stream) = std::net::TcpStream::connect(&addr) {
            differential_dataflow::logging::enable(worker, stream);
        } else {
            panic!("Could not connect to differential log address: {:?}", addr);
        }
    }
}