Shangyint/awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Awesome Tensor Compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Contents

Open Source Projects
Papers
Tutorials
Contribute

Open Source Projects

Papers

Survey

The Deep Learning Compiler: A Comprehensive Survey by Mingzhen Li et al., TPDS 2020
An In-depth Comparison of Compilers for DeepNeural Networks on Hardware by Yu Xing et al., ICESS 2019

Compiler and IR Design

TensorIR: An Abstraction for Automatic Tensorized Program Optimization by Siyuan Feng, Bohan Hou et al., arXiv 2022
Exocompilation for Productive Programming of Hardware Accelerators by Yuka Ikarashi, Gilbert Louis Bernstein et al., PLDI 2022
DaCeML: A Data-Centric Compiler for Machine Learning by Oliver Rausch et al., ICS 22
FreeTensor: A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs by Shizhi Tang et al., PLDI 2022
Roller: Fast and Efficient Tensor Compilation for Deep Learning by Hongyu Zhu et al., OSDI 2022
AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures by Zhen Zheng et al., ASPLOS 2022
Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction by Nicolas Vasilache et al., arXiv 2022
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections by Haojie Wang et al., OSDI 2021
MLIR: Scaling Compiler Infrastructure for Domain Specific Computation by Chris Lattner et al., CGO 2021
A Tensor Compiler for Unified Machine Learning Prediction Serving by Supun Nakandala et al., OSDI 2020
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks by Lingxiao Ma et al., OSDI 2020
Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures by Tal Ben-Nun et al., SC 2019
TASO: The Tensor Algebra SuperOptimizer for Deep Learning by Zhihao Jia et al., SOSP 2019
Tiramisu: A polyhedral compiler for expressing fast and portable code by Riyadh Baghdadi et al., CGO 2019
Triton: an intermediate language and compiler for tiled neural network computations by Philippe Tillet et al., MAPL 2019
Relay: A High-Level Compiler for Deep Learning by Jared Roesch et al., arXiv 2019
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning by Tianqi Chen et al., OSDI 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions by Nicolas Vasilache et al., arXiv 2018
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning by Scott Cyphers et al., arXiv 2018
Glow: Graph Lowering Compiler Techniques for Neural Networks by Nadav Rotem et al., arXiv 2018
DLVM: A modern compiler infrastructure for deep learning systems by Richard Wei et al., arXiv 2018
Diesel: DSL for linear algebra and neural net computations on GPUs by Venmugil Elango et al., MAPL 2018
The Tensor Algebra Compiler by Fredrik Kjolstad et al., OOPSLA 2017
Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines by Jonathan Ragan-Kelley et al., PLDI 2013

Auto-tuning and Auto-scheduling

One-shot tuner for deep learning compilers by Jaehun Ryu et al., CC 2022
Autoscheduling for sparse tensor algebra with an asymptotic cost model by Peter Ahrens et al., PLDI 2022
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance by Jiarong Xing et al., MLSys 2022
A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators by Dan Zhang et al., ASPLOS 2022
Lorien: Efficient Deep Learning Workloads Delivery by Cody Hao Yu et al., SoCC 2021
Value Learning for Throughput Optimization of Deep Neural Networks by Benoit Steiner et al., MLSys 2021
A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers by Phitchaya Mangpo Phothilimthana et al., PACT 2021
Ansor: Generating High-Performance Tensor Programs for Deep Learning by Lianmin Zheng et al., OSDI 2020
Schedule Synthesis for Halide Pipelines on GPUs by Sioutas Savvas et al., TACO 2020
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System by Size Zheng et al., ASPLOS 2020
ProTuner: Tuning Programs with Monte Carlo Tree Search by Ameer Haj-Ali et al., arXiv 2020
AdaTune: Adaptive tensor program compilation made efficient by Menghao Li et al., NeurIPS 2020
Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data by Jie Zhao et al., MICRO 2020
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation by Byung Hoon Ahn et al., ICLR 2020
A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra by Ryan Senanayake et al. OOPSLA 2020
Learning to Optimize Halide with Tree Search and Random Programs by Andrew Adams et al., SIGGRAPH 2019
Learning to Optimize Tensor Programs by Tianqi Chen et al., NeurIPS 2018
Automatically Scheduling Halide Image Processing Pipelines by Ravi Teja Mullapudi et al., SIGGRAPH 2016

Cost Model

An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs by Peter Ahrens et al., PLDI 2022
TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers by Lianmin Zheng., NeurIPS 2021
A Deep Learning Based Cost Model for Automatic Code Optimization by Riyadh Baghdadi et al., MLSys 2021
A Learned Performance Model for the Tensor Processing Unit by Samuel J. Kaufman et al., MLSys 2021
DYNATUNE: Dynamic Tensor Program Optimization in Deep Neural Network Compilation by Minjia Zhang et al., ICLR 2021
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks by Jaehun Ryu et al., arxiv 2021

CPU and GPU Optimization

DeepCuts: A deep learning optimization framework for versatile GPU workloads by Wookeun Jung et al., PLDI 2021
Analytical characterization and design space exploration for optimization of CNNs by Rui Li et al., ASPLOS 2021
UNIT: Unifying Tensorized Instruction Compilation by Jian Weng et al., CGO 2021
PolyDL: Polyhedral Optimizations for Creation of HighPerformance DL primitives by Sanket Tavarageri et al., arXiv 2020
Fireiron: A Data-Movement-Aware Scheduling Language for GPUs by Bastian Hagedorn et al., PACT 2020
Automatic Kernel Generation for Volta Tensor Cores by Somashekaracharya G. Bhaskaracharya et al., arXiv 2020
Swizzle Inventor: Data Movement Synthesis for GPU Kernels by Phitchaya Mangpo Phothilimthana et al., ASPLOS 2019
Optimizing CNN Model Inference on CPUs by Yizhi Liu et al., ATC 2019
Analytical cache modeling and tilesize optimization for tensor contractions by Rui Li et al., SC 19

NPU Optimization

AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction by Size Zheng et al., ISCA 2022
Towards the Co-design of Neural Networks and Accelerators by Yanqi Zhou et al., MLSys 2022
AKG: Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations by Jie Zhao et al., PLDI 2021

Graph-level Optimization

Apollo: Automatic Partition-based Operator Fusion through Layer by Layer Optimization by Jie Zhao et al., MLSys 2022
Equality Saturation for Tensor Graph Superoptimization by Yichen Yang et al., MLSys 2021
IOS: An Inter-Operator Scheduler for CNN Acceleration by Yaoyao Ding et al., MLSys 2021
Optimizing DNN Computation Graph using Graph Substitutions by Jingzhi Fang et al., VLDB 2020
Transferable Graph Optimizers for ML Compilers by Yanqi Zhou et al., NeurIPS 2020
FusionStitching: Boosting Memory IntensiveComputations for Deep Learning Workloads by Zhen Zheng et al., arXiv 2020
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning by Woosuk Kwon et al., Neurips 2020

Dynamic Model

DietCode: Automatic Optimization for Dynamic Tensor Programs by Bojian Zheng et al., MLSys 2022
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding by Pratik Fegade et al., MLSys 2022
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference by Haichen Shen et al., MLSys 2021
DISC: A Dynamic Shape Compiler for Machine Learning Workloads by Kai Zhu et al., EuroMLSys 2021
Cortex: A Compiler for Recursive Deep Learning Models by Pratik Fegade et al., MLSys 2021

Graph Neural Networks

Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph by Zhiqiang Xie et al., MLSys 2022
Seastar: vertex-centric programming for graph neural networks by Yidi Wu et al., Eurosys 2021
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems by Yuwei Hu et al., SC 2020

Distributed Computing

SpDISTAL: Compiling Distributed Sparse Tensor Computations by Rohan Yadav et al., SC 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning by Lianmin Zheng, Zhuohan Li, Hao Zhang et al., OSDI 2022
Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization by Colin Unger, Zhihao Jia, et al., OSDI 2022
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning by Ningning Xie, Tamara Norman, Diminik Grewe, Dimitrios Vytiniotis et al., MLSys 2022
DISTAL: The Distributed Tensor Algebra Compiler by Rohan Yadav et al., PLDI 2022
GSPMD: General and Scalable Parallelization for ML Computation Graphs by Yuanzhong Xu et al., arXiv 2021
Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads by Abhinav Jangda et al., ASPLOS 2022
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch by Jinhui Yuan et al., arXiv 2021
Beyond Data and Model Parallelism for Deep Neural Networks by Zhihao et al., MLSys 2019
Supporting Very Large Models using Automatic Dataflow Graph Partitioning by Minjie Wang et al., EuroSys 2019
Distributed Halide by Tyler Denniston et al., PPoPP 2016

Quantization and Sparsification

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning by Zihao Ye et al., arXiv 2022
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring by Adhitha Dias et al., ICS 2022
SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute by Ningxin Zheng et al., OSDI 2022
Compiler Support for Sparse Tensor Computations in MLIR by Aart J.C. Bik et al., arXiv 2022
Automatic Generation of High-Performance Quantized Machine Learning Kernels by Meghan Cowan et al., CGO 2020

Program Rewriting

Verified tensor-program optimization via high-level scheduling rewrites by Amanda Liu et al., POPL 2022
Pure Tensor Program Rewriting via Access Patterns (Representation Pearl) by Gus Smith et al., MAPL 2021
Equality Saturation for Tensor Graph Superoptimization by Yichen Yang et al., MLSys 2021

Verification and Testing

Coverage-guided tensor compiler fuzzing with joint IR-pass mutation by Jiawei Liu et al., OOPSLA 2022
End-to-End Translation Validation for the Halide Language by Basile Clément et al., OOPSLA 2022
A comprehensive study of deep learning compiler bugs by Qingchao Shen et al., ESEC/FSE 2021
Verifying and Improving Halide’s Term Rewriting System with Program Synthesis by Julie L. Newcomb et al., OOPSLA 2020

Tutorials

Contribute

We encourage all contributions to this repository. Open an issue or send a pull request.

Notes on the Link Format

We prefer using a link which points to a more informative page instead of a single pdf. For example, for arxiv papers, we prefer https://arxiv.org/abs/1802.04799 over https://arxiv.org/pdf/1802.04799.pdf. For OSDI papers, we prefer https://www.usenix.org/conference/osdi18/presentation/chen over https://www.usenix.org/system/files/osdi18-chen.pdf