aisys2023

Paper list

Reading list

3/7

Required reading

MLSys. MLSys: The Frontier of Machine Learning Systems.
Hidden. Hidden Technical Debt in Machine Learning Systems.
FacebookAIInfra. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, Xiaodong Wang. HPCA 2018.

3/9

Required reading

MLPerfTraining. MLPerf Training Benchmark.
MLPerfInf. MLPerf Inference Benchmark.

3/14

Required reading

TensorFlow. TensorFlow: A System for Large-Scale Machine Learning. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, Xiaoqiang Zheng. OSDI 2016.

Extra reading

TenorFlowDCF. Dynamic Control Flow in Large-Scale Machine Learning. Yuan Yu, Martin Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghemawat, Tim Harley, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, Xiaoqiang Zheng. EuroSys 2018.
RDAG. Improving the Expressiveness of Deep Learning Frameworks with Recursion. Eunji Jeong*, Joo Seong Jeong*, Soojeong Kim, Gyeong-In Yu, Byung-Gon Chun. EuroSys 2018.

3/16

Required reading

PyTorch. PyTorch: An Imperative Style, High-Performance Deep Learning Library. NeurIPS 2019.

3/21

Required reading

JANUS. JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs. Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Byung-Gon Chun. NSDI 2019.

Extra reading

AutoGraph. AutoGraph: Imperative-Style Coding with Graph-based Performance. Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K Lee, Zachary Nado, D Sculley, Tiark Rompf, Alexander B Witschko. MLSys 2019.
TFEager. TensorFlow Eager: A Multi-Stage, Python Embedded DSL for Machine Learning. Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, Shanqing Cai. MLSys 2019.
Terra. Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeongin Yu, Byung-Gon Chun. NeurIPS 2021.

3/23

PyTorch internals

3/28

Required reading

Halide. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, Saman Amarasinghe. PLDI 2013.

3/30

Project proposal

4/4

Required reading

TVM. An Automated End-to-End Optimizing Compiler for Deep Learning. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. OSDI 2018.

Extra reading

TC. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen. 2018.
TACO. The Tensor Algebra Compiler. Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman. OOPSLA 2017.

4/6

Required reading

Ansor. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica. Ansor: Generating High-Performance Tensor Programs for Deep Learning. OSDI 2020.

4/11

Required reading

TASO. TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, Alex Aiken. SOSP 2019.

Extra reading

PET. PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections. OSDI 2021.
AStitch. AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin. ASPLOS 2022.

4/13

Required reading

SparTA. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. OSDI 2022.

Extra reading

Roller. Roller: Fast and Efficient Tensor Compilation for Deep Learning. OSDI 2022.

4/18

Required reading

Nimble. Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning. Woosuk Kwon*, Gyeong-In Yu*, Eunji Jeong, Byung-Gon Chun. NeurIPS 2020 (Spotlight).
Rammer. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou. OSDI 2020.

4/20

Required reading

Clipper. A Low-Latency Online Prediction Serving System. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica. NSDI 2017.

Extra reading

PRETZEL. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, Matteo Interlandi. OSDI 2018.

5/2

Mid project presentation

5/4

Required reading

Clockwork. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace. OSDI 2020.

5/9

Required reading

Orca. Orca: A Distributed Serving System for Transformer-Based Generative Models. Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, Byung-Gon Chun. OSDI 2022. (patented)

5/11

Required reading

Parallax. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun. EuroSys 2019.

Extra reading

Horovod. Horovod: fast and easy distributed deep learning in TensorFlow. Alexander Sergeev and Mike Del Balso, arXiv 2018.
PS. Scaling Distributed Machine Learning with the Parameter Server. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. OSDI 2014.

5/16

Required reading

Unity. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. OSDI 2022.

Extra reading

GSPMD. GSPMD: General and Scalable Parallelization for ML Computation Graphs. ICLR 2021.

5/18

Required reading

Alpa. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. OSDI 2022.

Extra reading

PipeDream. PipeDream: Generalized Pipeline Parallelism for DNN Training. Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Philip B. Gibbons, Matei Zaharia. SOSP 2019.
Mesh-Tensorflow. Mesh-TensorFlow: Deep Learning for Supercomputers. NeurIPS 2018.

5/23

CXL (Guest lecture)

5/25

Required reading

MegatronLM. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. SC 2021.
Zero.ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. SC 2020.
ZeroInfinity. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. SC 2021.
tf.data. tf.data: A Machine Learning Data Processing Framework. VLDB 2021.

5/30

고성능 딥러닝 추론 가속기를 위한 하드웨어 앱스트랙션 설계

6/1

Required reading

Gandiva. Gandiva: Introspective Cluster Scheduling for Deep Learning. OSDI 2018.
Tiresias. Tiresias: A GPU Cluster Manager for Distributed Deep Learning. NSDI 2019.

Extra reading

Themis. Themis: Fair and Efficient GPU Cluster Scheduling. NSDI 2020.
Gavel. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. OSDI 2020.
Antman. AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. OSDI 2020.
HiveD. HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees. OSDI 2020.

6/5

ATOM: Versatile yet Energy-efficient Inference SoC

6/8

Required reading

Pollux. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. OSDI 2021.

Extra reading

AFS. Elastic Resource Sharing for Distributed Deep Learning. NSDI 2021.

6/13

Final project presentation with a demo

swsnu/aisys2023

aisys2023

Paper list

Reading list

3/7

Required reading

3/9

Required reading

3/14

Required reading

Extra reading

3/16

Required reading

3/21

Required reading

Extra reading

3/23

3/28

Required reading

3/30

4/4

Required reading

Extra reading

4/6

Required reading

4/11

Required reading

Extra reading

4/13

Required reading

Extra reading

4/18

Required reading

4/20

Required reading

Extra reading

5/2

5/4

Required reading

5/9

Required reading

5/11

Required reading

Extra reading

5/16

Required reading

Extra reading

5/18

Required reading

Extra reading

5/23

5/25

Required reading

5/30

6/1

Required reading

Extra reading

6/5

6/8

Required reading

Extra reading

6/13