SJTU-ReArch-Group/Paper-Reading-List

ReArch Group Paper Reading List

Seminars

Spring 2021

Date	Paper Title	Presenter	Notes
03.01	Training for Multi-resolution Inference Using Reusable Quantization Terms	Cong Guo
03.08	Toward Efficient Interactions between Python and Native Libraries	Yuxian Qiu
03.15	SpAtten: Efficient Natural Language Processing	Yue Guan
03.22	X-Stream: Edge-centric Graph Processing using Streaming Partitions	Zhihui Zhang
03.29	Loop Nested Optimization, Polyhedral Model and Micro-2020 Best Paper (Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data)	Zihan Liu	Slides
04.12	Defensive Approximation: Securing CNNs using Approximate Computing	Yakai Wang	Related Work
05.17	Commutative Data Reordering: A New Technique to Reduce Data Movement Energy on Sparse Inference Workloads	Yangjie Zhou	ISCA 2020
05.31	Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture	Zhihui Zhang	VLDB 2021
06.07	DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	Yue Guan	NeurIPS 2021

Summer 2021

Date	Paper Title	Presenter	Notes
07.14	AKG: automatic kernel generation for neural processing units using polyhedral transformations (PLDI 2021)	Yuxian Qiu	Slides
07.21	Floating-Point Format and Quantization for Deep Learning Computation	Cong Guo
07.28	P-OPT: Practical Optimal Cache Replacement for Graph Analytics	Yangjie Zhou	Slides
08.04	Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training	Zhihui Zhang
08.11	A Useful Tool CKA: Similarity of Neural Network Representations Revisited and It's application: Uncovering How Neural Network Representations Vary with Width and Depth	Zhengyi Li	Slides
08.18	Ansor: Generating High-Performance Tensor Programs for Deep Learning	Zihan Liu	Slides

Fall 2021

Date	Paper Title	Presenter	Notes
10.11	Adaptive numeric type for DNN quantization	Cong Guo
10.18	Compiling Graph Applications for GPUs with GraphIt	Yangjie Zhou	Slides
11.01	TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation	Zihan Liu	Slides
11.08	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	Zhengyi Li	Slides (code: zdea)
11.22	Dynamic Tensor Rematerialization Checkmate: Breaking The Memory Wall with Optimal Tensor Rematerialization	Yue Guan	Slides Slides
11.29	GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing	Zhihui Zhang	Presentation
12.06	CheckFreq: Frequent, Fine-Grained DNN Checkpointing	Guandong Lu	Slides
12.13	PipeDream: generalized pipeline parallelism for DNN training	Runzhe Chen	Slides
12.20	Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters	Yakai Wang	Slides

Spring 2022

Date	Paper Title	Presenter	Notes
3.10	Speculation Attack: Meltdown, Spectre, Pinned-Loads	Zihan Liu	Slides
3.24	SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute	Yue Guan
3.31	ROLLER: Fast and Efficient Tensor Compilation for Deep Learning	Yijia Diao	Link
4.07	Adaptable Register File Organization for Vector Processors	Zhihui Zhang
4.14	CORTEX: A COMPILER FOR RECURSIVE DEEP LEARNING MODELS	Yangjie Zhou	Slides
4.21	Zero-Knowledge Succinct Non-Interactive Argument of Knowledge	Shuwen Lu	Slides
5.05	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning	Runzhe Chen	Slides

Fall 2022

Date	Paper Title	Presenter	Notes
9.20	ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization	Cong Guo	Slides
9.27	X-cache: a modular architecture for domain-specific caches	Zihan Liu	Slides
10.18	Automatically Discovering ML Optimizations	Yangjie Zhou	Slides
11.8	Privacy Preserving Machine Learning--inference	Zhengyi Li	Slides
11.15	Dynamic Tensor Compilers	Yijia Diao	Slides

Spring 2023

Date	Paper Title	Presenter	Notes
3.30	JUNO: Algorithm-Hardware Mapping Co-design for Efficient\Approximate Nearest Neighbour Search in High Dimensional Space	Zihan Liu
4.6	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale; SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models; Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning; GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED; SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot; P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks; Offsite-Tuning: Transfer Learning without Full Model; LoRA: Low-Rank Adaptation of Large Language Models	Jiaming Tang	Slides
4.13	SMG: Towards Efficient Execution and Adequate Encryption of Private DNN Inference via Secure Micro-Graph	Zhengyi Li	Slides
5.04	FlexGen and FlashAttention	Yue Guan	Slides
5.11	Multi-Tenant DNN Inference: Spatial GPU Sharing	Yijia Diao	Slides
5.25	Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion	Yangjie Zhou	TVMConf Video

Fall 2023

Date	Paper Title	Presenter	Notes
9.21	GPU Warp Scheduling and Control Code	Weiming Hu	Slides
9.28	Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity	Yue Guan	Slides
10.12	Shared SIMD unit: Occamy, Two Out-of-Order Commit CPU: NOREBA and Orinoco	Zihan Liu	Slides
10.19	Multitasking on GPU: Preemption	Yijia Diao	Slides
10.26	SecretFlow-SPU: A Performant and User-Friendly Framework for Privacy-Preserving Machine Learning	Zhengyi Li	Slides
11.09	Efficient large-scale language model training on GPU clusters using megatron-LM; ZeRO: Memory Optimizations Toward Training Trillion Parameter Models; ZeRO-Offload: Democratizing Billion-Scale Model Training; ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning	Jiale Xu	Slides
11.16	ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING	Haoyan Zhang	Slides
12.07	WaveScalar;Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads	Gonglin Xu	Slides
12.14	Fast Inference from Transformers via Speculative Decoding;SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification;LLMCad: Fast and Scalable On-device Large Language Model Inference	Changming Yu	Slides
12.28	A Framework for Fine-Grained Synchronization of Dependent GPU Kernels;Fast Fine-Grained Global Synchronization on GPUs;AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs	Ziyu Huang	Slides

Spring 2024

Date	Paper Title	Presenter	Notes
03.21	Transparent GPU Sharing in Container Clouds for Deep Learning Workloads	Yijia Diao	Link
03.28	DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	Shuwen Lu	Slide
05.09	8-bit Transformer Inference and Fine-tuning for Edge Accelerators	Weiming Hu	Slide

DNN Architecture

Deep Learning Compiler

List Contributed by Zihan Liu

Past Architecture Papers

List Contributed by Jingwen Leng

MoE Related Papers

List Contributed by Shuwen Lu

Reading List From Other Groups

University of Sydney, Future System Architecture Lab