This is a fork of SnailTrail, a tool to run online critical path analysis on various stream processors (see also the SnailTrail NSDI'18 Paper).
The fork builds upon the original repository and implements further algorithms for analyzing stream processors. It currently focuses on the 0.9 version of Timely Dataflow and Differential Dataflow and won't refrain from breaking existing upstream abstractions (even though they should be relatively easy to add back in at a later point in time).
To try out SnailTrail, decide between online (via TCP) and offline (from file) mode.
-
First, start a computation you would like to log: As example,
cargo run --example triangles <input file> <batch size> <load balance factor> <#computation workers>
from withintimely-adapter
.E.g.,
cargo run --example triangles livejournal.graph 100 3 -w2
will load thelivejournal.graph
to use in the triangles computation, which is started with a batch size of 100. It is distributed over two workers, which will each write out events to three files.If you don't have the triangles computation ready, you can try a very basic log that is easily tweakable with
cargo run --example custom_operator 3 -w2
. -
Run SnailTrail to create a PAG: From
timely-snailtrail
, runcargo run --example inspect <# SnailTrail workers> <# of (simulated) source computation workers> <from-file?>
.E.g.,
cargo run --example inspect 2 6 f
will run SnailTrail with two workers, reading from the 6 files (2 workers * 3 load balance factor
) we generated in step 1. -
This creates a PAG as a differential
Collection
, to log it and use it, tweaktimely-snailtrail/examples/inspect.rs
.
-
First start SnailTrail, similarly to Offline step 2, but without the
from-file?
flag set; e.g.:cargo run --example inspect 2 6
. -
Now, start the computation you would like to log with env variable
SNAILTRAIL_ADDR=localhost:8000
.E.g., just like in offline mode:
env SNAILTRAIL_ADDR=localhost:8000 cargo run --example triangles livejournal.graph 100 3 -w2
-
This creates a PAG, to log it and use it, tweak
timely-snailtrail/examples/inspect.rs
.
Further debug logging of the examples and SnailTrail is provided by Rust's log
and env_log
crates. Passing RUST_LOG=info
(or trace
) as env variable when running examples should write out further logging to your std::out
.
Check out the Structure
section of this README
for a high-level overview.
The "magic" mostly happens at
timely-adapter/src/connect.rs
for (1) logging a computation and (2) connecting to it from SnailTrail,timely-adapter/src/lib.rs
for theLogRecord
creation,timely-snailtrail/src/pag.rs
for thePAG
creation, and theinspect.rs
,triangles.rs
andcustom_operator.rs
examples tying it all together.
Disclaimer: The PAG creation currently only creates local edges (I still need to update the remote edge creation for all the new stuff I added, they're commented out in pag.rs
).
(Roughly in order of appearance)
Type | Crate | Description |
---|---|---|
adapter | timely-adapter |
timely / differential 0.9 adapter |
infrastructure | logformat |
Shared definitions of core data types and serialization of traces. |
infrastructure, algorithms | timely-snailtrail |
PAG generation & algorithms for timely 0.9 with epochal semantics. |
Type | Crate | Description |
---|---|---|
adapter | spark-parser |
Spark adapter |
adapter | tensorflow |
TensorFlow adapter |
adapter | Flink | not publicly available |
adapter | Timely < 0.9 | not publicly available |
adapter | Heron | not publicly available |
infrastructure | logformat |
Shared definitions of core data types and serialization of traces (in Rust, Java). |
infrastructure | pag-construction |
Constructs the Program Activity Graph (PAG) from a flat stream of events which denote the start/end of computation and communication. Also has scripts to generate various plots. |
algorithms | snailtrail |
Calculates a ranking for PAG edges by computing how many times an edge appears in the set of all-pairs shortest paths ("critical participation", cf. the paper). |
Adapters read log traces from a stream processor (or a serialized representation) and convert the logged messages to logformat
's LogRecord
representation. This representation can then be used for PAG construction.
Depending on the stream processor, window semantics also come into play here. For example, the timely-adapter
currently uses an epoch-based window, which should make many algorithms on the PAG easier than working on a fixed window PAG.
Glue code, type definitions, (de)serialization, and intermediate representations that connect adapters to algorithms.
Implementation of various algorithms that run on top of the PAG to provide insights into the analyzed distributed dataflow's health and performance.
See the docs
subfolder for additional documentation. Of course, also check out the examples
and code documentation built with cargo doc
.
- Hoffmann et al.: SnailTrail Paper (NSDI '18)
- Malte Sandstede: A Short Introduction to SnailTrail (ETH '19)
- Vasia Kalavri: Towards self-managed, re-configurable streaming dataflow systems (UGENT '19)
- Moritz Hoffmann: SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows (NSDI '18)
- Vasia Kalavri: Online performance analysis of distributed dataflow systems (O'Reilly Velocity London '18)
SnailTrail is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, and LICENSE-MIT for details.