This repository contains the artifact for the paper "MagPy: Effective Operator Graph Instantiation for Deep Learning by Execution State Monitoring" to be appeared in USENIX ATC'24. MagPy is a JIT compiler for PyTorch programs. It can extract the operator graph from PyTorch programs and optimize the graph with a wide range of deep learning graph compilers.
- Artifacts Available:
- All MagPy related code are available under https://github.com/heheda12345/MagPy, and all artifact scripts are available under https://github.com/heheda12345/MagPy-AE
- Artifacts Functional:
- Documentation: the following of documents include detailed guidelines on how to build, install, test MagPy and the experiments to compare with other baselines.
- Completeness: the source code of MagPy is located at
MagPy/frontend
, withcache.py
as the cache,object_table.py
as the RefGraph,variables
folder as different type of ShadowNodes and ShadowVersion. - Exercisability: in this repository, we prepare all the script and data to reproduce the experiements in individual folders named by the figure or table id in paper.
- Results Reproduced:
- To reproduce the main results presented in our paper, we provide detailed guideline to install the softwares with the same configurations as we used in paper evauation, and scripts to help reproduce the results step by step.
-
Download code
git clone https://github.com/heheda12345/MagPy.git git clone https://github.com/heheda12345/MagPy-AE.git export FRONTEND_DIR=$PWD/MagPy export AE_DIR=$PWD/MagPy-AE
-
Prerequisite
- Python 3.9 (Other Python version is not supported yet)
- CUDA 11.8 (Fully tested, other versions may work)
-
Install dependencies The README.md in frontend repo only provides the guide to run MagPy. More dependencies are needed to reproduce the results in this paper. Please use the following steps to step up the environment.
cd $AE_DIR && pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html && cd .. cd $FRONTEND_DIR && pip install -e . && cd .. sed -i 's/py_all(a\.shape\[i\] for i in dims)/py_all(a.shape[i] > 0 for i in dims)/g' `python3 -c 'import torch; print(torch.__path__[0])'`/_refs/__init__.py sed -i "226,230s/^/# /" `python3 -c 'import torch_xla; print(torch_xla.__path__[0])'`/core/dynamo_bridge.py
The last two commands are for fixing the following issues:
- change the
py_all(a.shape[i] > 0 for i in dims)
topy_all(a.shape[i] > 0 for i in dims)
intorch/_refs/__init__.py
to avoid the guard on the reduction length - comment out the following lines of torch_xla to avoid crash
# fallback_ops = get_fallback_ops() # if len(fallback_ops) > 0: # raise RuntimeError( # f"Fail to extact the compiled graph because of fallback: {','.join(fallback_ops)}" # )
- change the
- Go to the
get_started_tutorial/
folder and follow README_GET_STARTED.md.
TABLE 3 and Figure 18 needs additional environment setup. Please follow the instructions in Figure18/README.md and Table3/README.md to setup the environment. Then, you can cd
to each folder and use ``./run.sh'' to reproduce the results.
Experiments | Figure # in Paper | Script Location |
---|---|---|
#1. Power of graph instantiation | Figure 1 | N/A (use the results in Figure 15) |
#2. End-to-end inference on NVIDIA A100 | Figure 15 | run.sh |
#3. Time breakdown of execution (BS=1) | Figure 16 | run.sh |
#4. Inference of models with dynamic shape | Figure 17 | run.sh |
#5. Inference of models with dynamic control flow | Figure 18 | run.sh |
#6. Number of exported operator graphs | Table 2 | run.sh |
#7. Result on ParityBench | Table 3 | README.md |
You can use the scripts in plot
folder to reproduce the figures. The figures will be saved to plot/figures/
folder.
cd $AE_DIR/plot
./plot.sh