A list of awesome compilers and optimization techniques (applicable to compilers) for different architectures and emerging domains.
Note: Although some projects are not about compiler design or implementation themselves, we still include them if their techniques are suitable for automation and compiler design.
- List of Conferences and Journals Considered
- Compiler Toolchain
- Compilers for AI chips
- Compilers for PIM
- Compilers for Brain-inspired Hardware
- Compilers for SIMT GPU
- Compilers for CPU
- Compilers for Mobile and Edge
- Compilers for RISC-V
- Compilers for Configurable Hardware
- Design Space Construction and Exploration
- Dynamic Shape and Control Flow
- Sparse Applications, Compilers, and Architectures
- Tree and Graph Applications, Compilers, and Architectures
- NAS Compilers and Architectures
- Security and Privacy
- Cost Model
- Survey and Books
- Talks, Tutorials, and Videos
- Conferences
- ASPLOS, ISCA, MICRO, HPCA
- OSDI, SOSP, PLDI, PPoPP, SC
- DAC, ICLR, NeurIPS, ATC, OOPSLA
- CGO, MLSys, SIGGRAPH, PACT, POPL, ICS
- Euro-Par, MAPL
- ICRC
- Journals
- TPDS, TCAD, TC
- TACO, TECS
- Preprint
- arXiv
-
Open-source
- A Data-Centric Optimization Framework for Machine Learning ICS 2022. Github Page. Document Page. Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler. ETH Zurich.
- MLIR: A Compiler Infrastructure for the End of Moore's Law arXiv 2020, Github Page. Document Page. Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, Oleksandr Zinenko. Google.
- JAX: Compiling machine learning programs via high-level tracing MLSys 2018. Github Page. Document Page. Roy Frostig, Matthew James Johnson, and Chris Leary. Google.
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning OSDI 2018. Github Page. Document Page. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. University of Washington.
- XLA: Optimizing Compiler for Machine Learning. Google.
- Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning arXiv 2018. Github Page. Document Page. Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, William Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar Vijay, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb. Intel.
- Glow: Graph Lowering Compiler Techniques for Neural Networks arXiv 2018. Github Page. Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Nadathur Satish, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy. Facebook.
- DLVM: A modern compiler infrastructure for deep learning systems ICLR 2018. Github Page. Richard Wei, Lane Schwartz, Vikram S. Adve. University of Illinois at Urbana-Champaign.
- The Tensor Algebra Compiler OOPSLA 2017. Github Page. Document Page. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, Saman P. Amarasinghe. Massachusetts Institute of Technology.
- Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines PLDI 2013. Github Page. Document Page. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman P. Amarasinghe. MIT CSAIL.
-
Close-source (binary available)
-
Auto-tensorization and Auto-vectorization
- TensorIR: An Abstraction for Automatic Tensorized Program Optimization. arXiv 2022. Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen. Shanghai Jiao Tong University.
- AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction ISCA 2022. code. Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. Peking University.
- UNIT: Unifying Tensorized Instruction Compilation CGO 2021. code. Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, Tony Nowatzki. University of California, Los Angeles.
-
Polyhedral Optimization
- AKG: automatic kernel generation for neural processing units using polyhedral transformations PLDI 2021. code. Jie Zhao, Bojie Li, Wang Nie, Zhen Geng, Renwei Zhang, Xiong Gao, Bin Cheng, Chen Wu, Yun Cheng, Zheng Li, Peng Di, Kun Zhang, Xuefeng Jin. State Key Laboratory of Mathematical Engineering and Advanced Computing, China.
- Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data MICRO 2020. code. Jie Zhao, Peng Di. State Key Laboratory of Mathematical Engineering and Advanced Computing, China.
- Hardware Abstractions for targeting EDDO Architectures with the Polyhedral Model PACT 2021. Angshuman Parashar, Prasanth Chatarasi, Po-An Tsai. NVIDIA.
-
Software & Hardware Co-Design
- Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning ICRC 2018. Ambrosi, Joao and Ankit, Aayush and Antunes, Rodrigo and Chalamalasetti, Sai Rahul and Chatterjee, Soumitra and El Hajj, Izzat and Fachini, Guilherme and Faraboschi, Paolo and Foltin, Martin and Huang, Sitao. Hewlett Packard Enterprise.
- SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks TECS 2021. code. Krishnan, Gokul and Mandal, Sumit K and Pannala, Manvitha and Chakrabarti, Chaitali and Seo, Jae-Sun and Ogras, Umit Y and Cao, Yu. Arizona State University.
- FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture ASPLOS 2019. Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, Yuan Xie. Tsinghua University
-
End2End Compiler -PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference ASPLOS 2019. code A. Ankit, I. El Hajj, S. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-M. Hwu, J. P. Strachan, K. Roy, D. Milojicic. Purdue University
- OCC: An Automated End-to-End Machine Learning Optimizing Compiler for Computing-In-Memory TCAD 2021. code. Siemieniuk, Adam and Chelini, Lorenzo and Khan, Asif Ali and Castrillon, Jeronimo and Drebes, Andi and Corporaal, Henk and Grosser, Tobias and Kong, Martin. -Polyhedral-Based Compilation Framework for In-Memory Neural Network Accelerators JETCS 2021. code. Jianhui Han, Xiang Fei, Zhaolin Li, Youhui Zhang. Tsinghua University
-
Code Offloading, Mapping and Scheduling
- Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities PACT 2016. Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Chita R. Das. Pennsylvania State University
- Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems ISCA 2016. Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, Stephen W. Keckler. Carnegie Mellon University
- To PIM or not for emerging general purpose processing in DDR memory systems ISCA 2022. Alexandar Devic, Siddhartha Balakrishna Rai, Anand Sivasubramaniam, Ameen Akel, Sean Eilert, Justin Eno. The Pennsylvania State University
-
Synthesis
- Simple magic: Synthesis and in-memory Mapping of logic execution for memristor-aided logic ICCAD 2017. Rotem Ben Hur, Nimrod Wald, Nishil Talati, Shahar Kvatinsky. Israel Institute of Technology
- SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a single Row to Improve Throughput TCAD 2020. Rotem Ben Hur, Ronny Ronen, Ameer Haj Ali, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, Shahar Kvatinsky. Israel Institute of Technology
- SSR: A Skeleton-based Synthesis Flow for Hybrid Processing-in-RRAM modes ICCAD 2021. Feng Wang, Guangyu Sun, Guojie Luo. Peking University
- SIMDRAM: a framework for bit-serial SIMD processing using DRAM ASPLOS 2021. Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri-Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu. ETH Zürich
- Network Transformation and Training
- Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler ASPLOS 2018. Yu Ji, Youhui Zhang, Wenguang Chen, Yuan Xie. Tsinghua University
- NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints MICRO 2016. Yu Ji, Youhui Zhang, Shuangchen Li, Ping Chi, Cihang Jiang, Peng Qu, Yuan Xie, Wenguang Chen. Tsinghua University
- Network Mapping
- A Design Flow for Mapping Spiking Neural Networks to Many-Core Neuromorphic Hardware ICCAD 2021. Shihao Song, M. Lakshmi Varshika, Anup Das, Nagarajan Kandasamy. Drexel University
- Mapping spiking neural networks onto a manycore neuromorphic architecture PLDI 2018. Chit-Kwan Lin, Andreas Wild, Gautham N. Chinya, Tsung-Han Lin, Mike Davies, Hong Wang. INTEL
-
Efficient Compute-intensive Kernel Generation
- Roller: Fast and Efficient Tensor Compilation for Deep Learning OSDI 2022. Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, and Lidong Zhou, Asaf Cidon, Gennady Pekhimenko. University of Toronto and Microsoft Research.
- Automatic Kernel Generation for Volta Tensor Cores arXiv 2020. Somashekaracharya G. Bhaskaracharya, Julien Demouth, Vinod Grover. NVIDIA.
- Triton: an intermediate language and compiler for tiled neural network computations MAPL 2019. code. Philippe Tillet, Hsiang-Tsung Kung, David D. Cox. Harvard University.
- Diesel: DSL for linear algebra and neural net computations on GPUs MAPL 2018. Venmugil Elango, Norm Rubin, Mahesh Ravishankar, Hariharan Sandanagobalane, Vinod Grover. NVIDIA.
-
Efficient Compute-intensive Kernel Fusion
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS HPCA 2022. code. Han Zhao, Weihao Cui, Quan Chen, Youtao Zhang, Yanchao Lu, Chao Li, Jingwen Leng, Minyi Guo. Shanghai Jiao Tong University.
- BOLT: BRIDGING THE GAP BETWEEN AUTO-TUNERS AND HARDWARE-NATIVE PERFORMANCE MLSys 2022. code. Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu. Rice University.
- Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs Euro-Par 2020. Xueying Wang, Guangli Li, Xiao Dong, Jiansong Li, Lei Liu, Xiaobing Feng. Institute of Computing Technology, Chinese Academy of Science.
-
Efficient Memory-intensive Kernel Fusion
- Automatic Horizontal Fusion for GPU Kernels CGO 2022. Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long. Carnegie Mellon University.
- AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures ASPLOS 2022. Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin. Alibaba Group.
- FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads arXiv 2020. Zhen Zheng, Pengzhan Zhao, Guoping Long, Feiwen Zhu, Kai Zhu, Wenyi Zhao, Lansong Diao, Jun Yang, Wei Lin. Alibaba Group.
- From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization CGO 2019. Bo Qiao, Oliver Reiche, Frank Hannig, Jürgen Teich. Friedrich-Alexander University Erlangen-Nürnberg.
- Versapipe: a versatile programming framework for pipelined computing on GPU MICRO 2017. cdoe. Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, Wenguang Chen. Tsinghua University.
- Scalable Kernel Fusion for Memory-Bound GPU Applications SC 2014. Mohamed Wahib, Naoya Maruyama. RIKEN Advanced Institute for Computational Science JST, CREST.
-
Polyhedral Optimization
- Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code CGO 2019. code. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, Saman P. Amarasinghe. Massachusetts Institute of Technology.
- Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions arXiv 2018. code. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen. Facebook AI Research.
-
Program Synthesis
- EQUALITY SATURATION FOR TENSOR GRAPH SUPEROPTIMIZATION MLSys 2021. Yichen Yang, Phitchaya Mangpo Phothilimthana, Yisu Remy Wang, Max Willsey, Sudip Roy, Jacques Pienaar. MIT EECS & CSAIL.
- Swizzle Inventor: Data Movement Synthesis for GPU Kernels ASPLOS 2019. Phitchaya Mangpo Phothilimthana, Archibald Samuel Elliott, An Wang, Abhinav Jangda, Bastian Hagedorn, Henrik Barthels, Samuel J. Kaufman, Vinod Grover, Emina Torlak, Rastislav Bodík. University of California, Berkeley.
-
Compilers for Irregular Workloads
- FreeTensor: A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs PLDI 2022. code. Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, Chen Zhang. Tsinghua University.
-
Compilers for HPC Workloads on GPU
- Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation TACO 2021. code. Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, Tobias Grosser. ETH Zurich.
-
Distributed Optimization
- DISTAL: The Distributed Tensor Algebra Compiler PLDI 2022. Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. Stanford University.
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI 2022. Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica. UC Berkeley
- VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware MLSys 2022. Andrew Or, Haoyu Zhang, Michael None Freedman. Princeton University.
- OneFlow: Redesign the Distributed Deep Learning Framework from Scratch arXiv 2021. Jinhui Yuan, Xinqi Li, Cheng Cheng, Juncheng Liu, Ran Guo, Shenghang Cai, Chi Yao, Fei Yang, Xiaodong Yi, Chuan Wu, Haoran Zhang, Jie Zhao. OneFlow Research.
- Vectorization Optimization
- All you need is superword-level parallelism: systematic control-flow vectorization with SLP PLDI 2022. Yishen Chen, Charith Mendis, and Saman Amarasinghe. Massachusetts Institute of Technology, USA.
- VeGen: a vectorizer generator for SIMD and beyond ASPLOS 2021. Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. Massachusetts Institute of Technology, USA.
- NeuroVectorizer: end-to-end vectorization with deep reinforcement learning CGO 2020. Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, and Ion Stoica. University of California at Berkeley, USA.
- Compiler Auto-Vectorization with Imitation Learning NIPS 2019. Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin. MIT CSAIL.
- Translating Traditional SIMD Instructions to Vector Length Agnostic Architectures CGO 2019. Fu, Sheng-Yu and Hsu, Wei-Chung. National Taiwan University.
- Extending LLVM for Lightweight SPMD Vectorization: Using SIMD and Vector Instructions Easily from Any Language CGO 2019. Kruppe, Robin and Oppermann, Julian and Sommer, Lukas and Koch, Andreas. Embedded Systems and Applications Group, TU, Germany.
- Super-Node SLP: Optimized Vectorization for Code Sequences Containing Operators and Their Inverse Elements CGO 2019. V. Porpodas, R. C. O. Rocha, E. Brevnov, L. F. W. Góes and T. Mattson. Intel Corporation, USA.
- Partial control-flow linearization PLDI 2018. Moll, Simon and Hack, Sebastian. Saarland University, Germany.
- Look-ahead SLP: auto-vectorization in the presence of commutative operations CGO 2018. Vasileios Porpodas, Rodrigo C. O. Rocha, and Luís F. W. Góes. Intel, USA.
- Parallelism and Locality Optimization
- Analytical characterization and design space exploration for optimization of CNNs ASPLOS 2021. Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan. University of Utah, USA.
- AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming ASPLOS 2020. Mark Hildebrand, Jawad Khan, Sanjeev Trika, Jason Lowe-Power, and Venkatesh Akella. University of California, Davis, Davis, CA, USA.
- T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware ISCA 2020. Ying, Victor A. and Jeffrey, Mark C. and Sanchez, Daniel. MIT CSAIL.
- Optimizing data-intensive computations in existing libraries with split annotations SOSP 2019. Palkar, Shoumik and Zaharia, Matei. Stanford University.
- Model-driven transformations for multi- and many-core CPUs PLDI 2019. Kong, Martin and Pouchet, Louis-Noel. Brookhaven National Laboratory, USA.
- Compilers for Sparse Workloads
- Efficient Execution of Graph Algorithms on CPU with SIMD Extensions CGO 2021. Zheng, Ruohuang and Pai, Sreepathi. Department of Computer Science, University of Rochester, USA.
- Generating piecewise-regular code from irregular structures PLDI 2019. Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. Colorado State University, USA.
- CVR: efficient vectorization of SpMV on x86 processors CGO 2018. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. Institute of Computing Technology at Chinese Academy of Sciences, China.
- Compilers for Dense Workloads
- Optimizing N-dimensional, winograd-based convolution for manycore CPUs PPoPP 2018. Zhen Jia, Aleksandar Zlateski, Fredo Durand, and Kai Li. Princeton University.
- SIMD code generation for stencils on brick decompositions PPoPP 2018. Tuowen Zhao, Mary Hall, Protonu Basu, Samuel Williams, and Hans Johansen. University of Utah.
- Program generation for small-scale linear algebra applications CGO 2018. Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. ETH Zurich, Switzerland.
- Compilers for End-to-End Networks
- SPNC: An Open-Source MLIR-Based Compiler for Fast Sum-Product Network Inference on CPUs and GPUs CGO 2022. Sommer, Lukas and Axenie, Cristian and Koch, Andreas. Embedded Systems and Applications Group, TU Darmstadt, Germany.
- Multi-target Compiler for the Deployment of Machine Learning Models CGO 2019. Castro-Lopez, Oscar and Vega-Lopez, Ines F. Facultad de Informatica, Universidad Autonoma de Sinaloa, Culiacan, Mexico.
- Compilers for Intermittent Devices
- WARio: efficient code generation for intermittent computing PLDI 2022. Vito Kortbeek, Souradip Ghosh, Josiah Hester, Simone Campanoni, and Przemysław Pawełczak. Delft University of Technology, Netherlands.
- Time-sensitive Intermittent Computing Meets Legacy Software ASPLOS 2020. Vito Kortbeek, Kasim Sinan Yildirim, Abu Bakar, Jacob Sorber, Josiah Hester, and Przemysław Pawełczak. Delft University of Technology, Delft, Netherlands.
- Adaptive low-overhead scheduling for periodic and reactive intermittent execution PLDI 2020. Kiwan Maeng and Brandon Lucia. Carnegie Mellon University, USA.
- Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems ASPLOS 2019. Graham Gobieski, Brandon Lucia, and Nathan Beckmann. Carnegie Mellon University, USA.
- Supporting peripherals in intermittent systems with just-in-time checkpoints PLDI 2019. Kiwan Maeng and Brandon Lucia. Carnegie Mellon University, USA.
- Compilers for Digital Signal Processors
- Vector instruction selection for digital signal processors using program synthesis ASPLOS 2022. Maaz Bin Safeer Ahmad, Alexander J. Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. Adobe, USA.
- Vectorization for digital signal processors via equality saturation ASPLOS 2021. Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. Cornell University, USA.
- Optimization for On-device Learning
- POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging ICML 2022. Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph Gonzalez. University of California Berkeley.
- ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity ICLR 2022. Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gusmao, Yan Gao, Titouan Parcollet, Nicholas Donald Lane. Department of Computer Science and Technology, University of Cambridge.
- Distributed Distillation for On-Device Learning NIPS 2020. Ilai Bistritz, Ariana Mann, Nicholas Bambos. Stanford University.
- E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings NIPS 2019. Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang. Department of Electrical and Computer Engineering, Rice University.
- Model Compression for Mobile Devices
- CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design Commun. ACM. Hui Guan, Shaoshan Liu, Xiaolong Ma, Wei Niu, Bin Ren, Xipeng Shen, Yanzhi Wang, Pu Zhao. University of Massachusetts at Amherst.
- PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning ASPLOS 2020. Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. College of William and Mary, Williamsburg, VA, USA.
- Compiling KB-sized machine learning models to tiny IoT devices PLDI 2019. Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. Microsoft Research, India.
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking ICLR 2019. Haichuan Yang, Yuhao Zhu, Ji Liu. Department of Computer Science, University of Rochester, Rochester, USA.
- Optimization for Mobile Device Inference
- Towards a Domain-Extensible Compiler: Optimizing an Image Processing Pipeline on Mobile CPUs CGO 2021. Koehler, Thomas and Steuwer, Michel. Philips Research, Hamburg, Germany.
- AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning MICRO 2020. Kim, Young Geun and Wu, Carole-Jean. Korea University, Seoul, South Korea.
- Neural Architecture Search for Mobile Devices
- MCUNet: Tiny Deep Learning on IoT Devices NIPS 2020. Ji Lin, Wei-Ming Chen, Yujun Lin, john cohn, Chuang Gan, Song Han. MIT.
- Constrained deep neural network architecture search for IoT devices accounting for hardware calibration NIPS 2019. Florian Scheidegger, Luca Benini, Costas Bekas, A. Cristiano I. Malossi. ETH Zürich, Switzerland.
- HLL Compilers
- DSL Compilers
- Others
- HW/SW approaches for RISC-V code size reduction CARRV 2020. Perotti, Matteo, et al.
- Automatic Code Generation for Rocket Chip RoCC Accelerators CARRV 2020. Xu, Pengcheng, and Yun Liang.
- Experiments and optimizations for TVM on RISC-V Architectures with P Extension VLSI-DAT 2020. Chen, Yi-Ru.
- Enabling TVM on RISC-V Architectures with SIMD Instructions RISC-V Workshop 2019
- Towards Deep Learning using TensorFlow Lite on RISC-V CARRV 2019 Louis, Marcia Sahaya, et al.
- Domain specific langage
- HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs FPGA 2022. Shaojie Xiang, Yihsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit Pal, and Zhiru Zhang
- HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing FPGA 2019. Yihsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang
- T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations FCCM 2019. Nitish Kumar Srivastava, Hongbo Rong, Prithayan Barua, Guanyu Feng, Huanqi Cao, Zhiru Zhang, David H. Albonesi, Vivek Sarkar, Wenguang Chen, Paul Petersen, Geoff Lowney, Adam Herr, Christopher J. Hughes, Timothy G. Mattson and Pradeep Dubey
- SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs ICCAD 2020. Lai, Yi-Hsiang, Rong, Hongbo, Zheng, Size, Zhang, Weihao, Cui, Xiuping, Jia, Yunshan, Wang, Jie, Sullivan, Brendan, Zhang, Zhiru, Liang, Yun, Youhui Zhang, Jason Cong, Nithin George, Jose Alvarez, Christopher J. Hughes and Pradeep Dubey
- Darkroom: compiling high-level image processing code into hardware pipelines TOG 2014. James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz and Pat Hanrahan
- Spatial: a language and compiler for application accelerators PLDI 2018. David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis and Kunle Olukotun
- A Unified Backend for Targeting FPGAs from DSLs ASAP 2018. Emanuele Del Sozzo, Riyadh Baghdadi, Saman Amarasinghe, and Marco D. Santambrogio
-
Graph Optimizations
- APOLLO: AUTOMATIC PARTITION-BASED OPERATOR FUSION THROUGH LAYER BY LAYER OPTIMIZATION MLSys 2022. Jie Zhao, Xiong Gao, Ruijie Xia, Zhaochuang Zhang, Deshi Chen, Lei Chen, Renwei Zhang, Zhen Geng, Bin Cheng, Xuefeng Jin. State Key Laboratory of Mathematical Engineering and Advanced Computing.
- NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training TPDS 2022. Size Zheng, Renze Chen, Yicheng Jin, Anjiang Wei, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. Peking University.
- DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion PLDI 2021. Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren. College of William & Mary.
- DeepCuts: A Deep Learning Optimization Framework for Versatile GPU Workloads PLDI 2021 Wookeun Jung, Thanh Tuan Dao, Jaejin Lee. Seoul National University.
- Pet: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI 2021. code. Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, Zhihao Jia. Tsinghua University.
- Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks OSDI 2020. code. Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou. Peking University and Microsoft Research.
- TASO: optimizing deep learning computation with automatic generation of graph substitutions SOSP 2019. code. Zhihao Jia, Oded Padon, James J. Thomas, Todd Warszawski, Matei Zaharia, Alex Aiken. Stanford University.
- Relay: a new IR for machine learning frameworks MAPL 2018. code. Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, Zachary Tatlock. University of Washington.
-
Auto-tuning and Auto-scheduling
- Glimpse: mathematical embedding of hardware specification for neural compilation DAC 2022. Byung Hoon Ahn, Sean Kinzer, Hadi Esmaeilzadeh. University of California.
- Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU OOPSLA 2021. Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan Ragan-Kelley. Massachusetts Institute of Technology.
- Ansor: Generating High-Performance Tensor Programs for Deep Learning OSDI 2020. code. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica. UC Berkeley.
- FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System ASPLOS 2020. code. Size Zheng, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng. Peking University.
- ProTuner: Tuning Programs with Monte Carlo Tree Search arXiv 2020. Ameer Haj-Ali, Hasan Genc, Qijing Huang, William S. Moses, John Wawrzynek, Krste Asanovic, Ion Stoica. UC Berkeley.
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation ICLR 2020. code. Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh. University of California, San Diego.
- Learning to Optimize Halide with Tree Search and Random Programs SIGGRAPH 2019. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley. Facebook AI Research.
- Learning to Optimize Tensor Programs NeurIPS 2018. code. Tianqi Chen, Lianmin Zheng, Eddie Q. Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. University of Washington.
- Automatically Scheduling Halide Image Processing Pipelines SIGGRAPH 2016. Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, Kayvon Fatahalian. Carnegie Mellon University.
-
Analytical Approaches
- Analytical characterization and design space exploration for optimization of CNNs ASPLOS 2021. Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan. University of Utah.
- Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks arXiv 2021. Yao Wang, Xingyu Zhou, Yanming Wang, Rui Li, Yong Wu, Vin Sharma. Amazon Web Services.
- Analytical cache modeling and tilesize optimization for tensor contractions SC 2019. Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, P. Sadayappan. University of Utah.
- Dynamic Shape Operator
- DietCode: Automatic Optimization for Dynamic Tensor Programs MLSys 2022 code Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko. AWS.
- The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding MLSys 2022 Pratik Fegade, Tianqi Chen, Phillip Gibbons, Todd Mowry. CMU.
- Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference MLSys 2021 Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang. AWS.
- Dynamic Computation Graph
- DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs ATC 2022 code Weihao Cui, Han Zhao, Quan Chen, Hao Wei, and Zirui Li, Deze Zeng, Chao Li, Minyi Gu. Shanghai Jiao Tong University.
- Cortex: A Compiler for Recursive Deep Learning Models MLSys 2021 Pratik Fegade, Tianqi Chen, Phillip Gibbons, Todd Mowry. CMU.
- Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference MLSys 2021 Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang. AWS.
- DISC: A Dynamic Shape Compiler for Machine Learning Workloads EuroMLSys 2021 code K. Zhu, W.Y. Zhao, Z. Zheng, T.Y. Guo, P.Z. Zhao, J.J. Bai, J. Yang, X.Y. Liu, L.S. Diao, and W. Lin. Alibaba.
- Cavs: An Efficient Runtime System for Dynamic Neural Networks ATC 2018 code Shizhen Xu, Hao Zhang, Graham Neubig, and Wei Dai, Jin Kyu Kim, Zhijie Deng, Qirong Ho,Guangwen Yang,Eric P. Xing. CMU THU.
- On-the-fly Operation Batching in Dynamic Computation Graphs NIPS17 code Graham Neubig, Yoav Goldberg, Chris Dyer. CMU.
- Deep Learning with Dynamic Computation Graphs ICLR 2017 Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, Peter Norvig. Google.
- DyNet: The Dynamic Neural Network Toolkit arXiv 2017 code Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin. CMU.
- Compiler Design
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning arXiv 2022. Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze. University of Washington.
- Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model PLDI 2022. Peter Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. MIT CSAIL.
- Unified Compilation for Lossless Compression and Sparse Computing CGO 2022. Daniel Donenfeld, Stephen Chou, and Saman Amarasinghe. MIT CSAIL.
- Dynamic Sparse Tensor Algebra Compilation arXiv 2021. Stephen Chou and Saman Amarasinghe. MIT CSAIL.
- Compilation of Sparse Array Programming Models OOPSLA 2021. Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. MIT CSAIL.
- A sparse iteration space transformation framework for sparse tensor algebra OOPSLA 2020. Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman P. Amarasinghe, Fredrik Kjolstad. Reservoir Labs.
- Automatic Generation of Efficient Sparse Tensor Format Conversion Routines PLDI 2020. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. MIT CSAIL.
- Tensor Algebra Compilation with Workspaces CGO 2019. Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, Saman P. Amarasinghe. MIT.
- Compiler Design
- Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference MICRO 2022. Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, Uday Bondhugula. Indian Institute of Science-Bangalore.
- GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL CGO 2022. Ajay Brahmakshatriya, Saman P. Amarasinghe. CSAIL, MIT.
- Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures ISCA 2021. Ajay Brahmakshatriya, Emily Furst, Victor A. Ying, Claire Hsu, Changwan Hong, Max Ruttenberg, Yunming Zhang, Dai Cheol Jung, Dustin Richmond, Michael B. Taylor, Julian Shun, Mark Oskin, Daniel Sánchez, Saman P. Amarasinghe. MIT CSAIL.
- A Tensor Compiler for Unified Machine Learning Prediction Serving OSDI 2020. code. Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi. UC San Diego.
- Optimizing ordered graph algorithms with GraphIt CGO 2020. Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman P. Amarasinghe, Julian Shun. MIT CSAIL.
- GraphIt: A High-Performance Graph DSL OOPSLA 2018. Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, Saman P. Amarasinghe. MIT CSAIL.
-
Compiler Design
- Neural Architecture Search as Program Transformation Exploration ASPLOS 2021. Jack Turner, Elliot J. Crowley, Michael F. P. O'Boyle. University of Edinburgh United Kingdom.
-
Architecture Design
- NASA: Accelerating Neural Network Design with a NAS Processor ISCA 2021. Xiaohan Ma, Chang Si, Ying Wang, Cheng Liu, Lei Zhang. CAS University of Chinese Academy of Sciences.
- Compiler Design
- PlaidML-HE: Acceleration of Deep Learning Kernels to Compute on Encrypted Data ICCD 2019. Huili Chen, Rosario Cammarota, Felipe Valencia, Francesco Regazzoni. Intel AI Privacy and Security Research.
-
Model Design
- A LEARNED PERFORMANCE MODEL FOR TENSOR PROCESSING UNITS MLSys 2021. Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows. Google.
-
Dataset
- TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers NeurIPS Datasets and Benchmarks 2021. code. Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph Gonzalez, Ion Stoica, Ameer Haj-Ali. UC Berkeley.