Piper: Multidimensional Planner for DNN Parallelization - code

This code package contains algorithms (proof-of-concept implementation) and input files (profiled DNN models / workloads) from the paper "Piper: Multidimensional Planner for DNN Parallelization" published at NeurIPS 2021. It allows one to reproduce the results in the paper, as well as run the partitioning algorithms on other workloads.

Input format

All our algorithms take as input a JSON file with the following format (all fields are mandatory unless indicated otherwise). This format closely follows our model (see Section 3 "Problem Setup" in the paper):

maxMemoryPerDevice (floating-point): a memory size limit of a single accelerator, in bytes,
maxDevices (integer): number of accelerators (k from the paper),
maxBatchSize (integer): maximum number of microbatches in a batch (N from the paper),
bandwidth (floating-point): bandwidth (from each device to the outside),
nodes (array): for each node (layer):
- id (integer): unique ID of node,
- TMPCs (dictionary): mapping from tensor-parallelism degree (t) to an array of TMPCs, each having:
  - id (string): name,
  - timePerSample (floating-point): compute latency (backward+forward, quantity p from the paper),
  - parameterSize (floating-point): size of weights (to be used in computing data-parallel resync costs, quantity w from the paper),
  - memoryUsageA, memoryUsageB (floating-point): memory usage coefficients a and b (see paper),
  - syncTimeFw (dictionary): mapping from heads of outgoing edges to their parameters c^fw (see paper),
  - syncTimeBw (dictionary): mapping from tails of incoming edges to their parameters c^bw (see paper),
edges (array): for each edge:
- sourceId (integer): the ID of the tail of the edge (edge from sourceId to destId),
- destId (integer): the ID of the head of the edge,
- communicationCost (floating-point): cost of transfer over this edge (in bytes).

Other debug information may be present in the input files, such as names on nodes.

Piper algorithm

The solution is implemented in algo.cpp. It is a single C++ file (using one header-only library for JSON parsing) and can be compiled with a recent version of gcc by running e.g. g++ -O3 algo.cpp -o algo.exe.

The compiled program runs experiments from the paper - see main() at the end of algo.cpp. It is possible to run only a subset of the evaluations by simply commenting out some lines in main(). The simplest mode of usage is shown in single(). The main example input file is inputs/bert32a100.json.

Legal notices

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

msr-fiddle/piper

Piper: Multidimensional Planner for DNN Parallelization - code

Input format

Piper algorithm

Legal notices