This is not an official Google product. This project was created by Michael Isaev and Nic McDonald at Google.
ParaGraph is a _Para_llel Graph representation of parallel computing applications that can be executed in a system level simulator. ParaGraph is designed to be an interface between the parallel program source code, and a system level simulator that should "execute" the program on the model of a distributed system. You can think about ParaGraph as an IR (Intermediate Representation) that can be interfaced with various simulators as a backend, just similar to how LLVM IR or MLIR can be interfaced with backends that target various hardware. This approach allows us to introduce accurate application models to system level simulation frameworks, and model parallel computing applications execution on the future distributed systems.
Paragraph extracts high level computation and communication nodes from the compiled program or an execution trace, performs topology-based communication lowering, and rewrites the graph in the special format suitable for graph execution in a system simulator. Currently, we are targeting Tensorflow and PyTorch programs through XLA compiler. MPI programs are planned to be supported in the future.
Originally, ParaGraph was a summer 2020 internship project at Google that aimed to extract communication traffic from Machine Learning applications written in TensorFlow, and simulate it in SuperSim event driven network simulator.
To install all the tools needed to to work with ParaGraph, you can use following bash script:
mkdir paragraph-sim; cd paragraph-sim
for prj in paragraph-core ho-bridge; do
git clone git@github.com:paragraph-sim/${prj} ${prj}
cd ${prj}
bazel test -c opt ...
cd ..
done
for prj in paragraph-creator hlo-examples; do
git clone git@github.com:paragraph-sim/${prj} ${prj}
done
The C++ projects use Bazel for building binaries. To install Bazel, follow the directions at here. You need bazel-3.7.2. Use the following command to build and test the project
bazel test -c opt ...
To install ParaGraph tools using spack, follow the steps from the build_with_spack_cmake.sh
script:
- Install spack
- Install nicspack repository
- Run paragraph building script
./build_with_spack_cmake.sh
We primarily use Bazel to build ParaGraph and its tools. However, if you want to build ParaGraph for you project using cmake, it is also possible. Please consult with CMakeLists.txt, libparagraph.pc.in, and build_with_spack_cmake.sh.
In addition to the core library that can be used to add ParaGraph graph support to external simulators, paragraph-core
repo provides several tools that can be used to build various flexible modeling workflows.
Graph converter can be used to convert ParaGraph graph files between supported binary version .pb
and text version.textproto
. It can also be used to add extra dependencies to the graph nodes and make every instruction execute sequentially in post order traversal using flag --enforce_postorder
.
To build graph converter, type
bazel build -c opt paragraph/graph:graph_converter
To see helpline with information about flags and how to use it, type
bazel-bin/paragraph/graph/graph_converter --help
Graph data parallel extender helps to change the system size for the given input graph. It only works with graphs that correspond to neural networks workloads that only utilize data parallelism. To build graph data parallel extender, type
bazel build -c opt paragraph/graph:graph_data_parallel_extender
To see helpline with information about flags and how to use it, type
bazel-bin/paragraph/graph/graph_data_parallel_extender --help
Graph translator performs a graph translation according to translation_config.json
file. Translation process helps to rewrite a graph for a particular simulator by substituting instructions that simulator doesn't understand with new instruction sequences that only have instructions supported by simulators.
To build graph translator, type
bazel build -c opt paragraph/translation:graph_translator
To see helpline with information about flags and how to use it, type
bazel-bin/paragraph/translation/graph_translator --help
Simulator implements a simple analytical model that can "execute" the provided graph. It is useful for graph debugging and testing, as well as for fast performance prediction of graph execution time using a simple roofline analytical model for both computation and communication. To build simulator, type
bazel build -c opt paragraph/simulator/simulator
To see helpline with information about flags and how to use it, type
bazel-bin/paragraph/simulator/simulator --help
ParaGraph can be used to build flexible modeling workflow. It is totally up to each user to pick the right tools and the necessary steps to support their needs. Here we show an example of one of the possible modeling workflows one could build using open-source tools currently supported by ParaGraph:
- Get a graph by using paragraph-creator or translating an XLA HLO graph using hlo-bridge.
- Translate a graph using graph translator for your simulator, e.g. to use a particular
all-reduce
algorithm. - Run graph application modeling in a simulator, for example SuperSim
You can also take a look at paragraph-scripts repo with useful scripts and translation config examples.
The ParaGraph paper is under review, we will update this section once itt is accepted.