AI Chip Paper List

Table of Contents

About This Project

​ This project aims to help engineers, researchers and students to easily find and learn the good thoughts and designs in AI-related fields, such as AI/ML/DL accelerators, chips, and systems, proposed in the top-tier architecture conferences (ISCA, MICRO, ASPLOS, HPCA).

​ This project is initiated by the Advanced Computer Architecture Lab (ACA Lab) in Shanghai Jiao Tong University in collaboration with Biren Research. Articles from additional sources is being added. Please let us know if you have any comments or willing to contribute.

The Listing of Tags

​ For guidance and searching purposes, Tags and/or notes are assigned to all these papers . We will use the following tags to annotate these papers.

Tags

The Chronological Listing of Papers

​ We list all AI related articles collected. The links of paper/slides/note are provided under the title of each article If available. Updating is in progress

ISCA

2020

Tags - Title Authors Affiliations
Inference; SIMD High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs
paper
Glenn Henry; Parviz Palangpour Centaur Technology
Inference; dataflow Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workload
paper note
Dennis Abts; Jonathan Ross Groq Inc.
Spiking; dataflow; Sparsity SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks
paper note
Surya Narayanan; Karl Taht University of Utah
Inference; benchmarking MLPerf Inference Benchmark
paper note
Vijay Janapa Reddi; Lingjie Xu, etc.
GPU; Compression Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
paper note
Esha Choukse; Michael Sullivan University of Texas at Austin; NVIDIA
Inference; runtime A Multi-Neural Network Acceleration Architecture
paper note
Eunjin Baek; Dongup Kwon; Jangwoo Kim Seoul National University
Inference; Dynamic fixed-point DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration
paper note
Zhuoran Song; Naifeng Jing; Xiaoyao Liang Shanghai Jiao Tong University
Training; LSTM; GPU Echo: Compiler-Based GPU Memory Footprint Reduction for LSTM RNN Training
paper note
Bojian Zheng; Nandita Vijaykumar University of Toronto
Inference DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation
paper note
Udit Gupta; Samuel Hsia; Vikram Saraph Harvard University; Facebook Inc

2019

Tags - Title Authors Affiliations
Inference, Dataflow 3D-based Video Recognition Acceleration by Leveraging Temporal Locality
paper note
Huixiang Chen; Tao Li University of Florida
Inference; Quantumn A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron Superconducting Technology
paper note
Ruizhe Cai; Ao Ren; Nobuyuki Yoshikawa; Yanzhi Wang Northeastern University
Training; Reinforcement Learning; Distributed training Accelerating Distributed Reinforcement Learning with In-Switch Computing
paper note
Youjie Li; Jian Huang UIUC
Training; Sparsity Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks
paper note
Jiaqi Zhang; Tao Li University of Florida
Inference; Sparsity; Bit-serial Laconic Deep Learning Inference Acceleration
paper note
Sayeh Sharify; Andreas Moshovos University of Toronto
Inference; Memory; bandwidth-saving; large-scale networks; compression MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks
paper note
Hanhwi Jang; Jangwoo Kim POSTECH; Seoul National University
Inference; ReRAM; Sparsity Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks
paper note
Tzu-Hsien Yang National Taiwan University; Academia Sinica; Macronix International.
Infernce; Redundant computing TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network
paper note
Chunhua Deng; Bo Yuan Rutgers University
Training; CNN; floating point FloatPIM_ in-memory acceleration of deep neural network training with high precision
paper note
Mohsen Imani; Tajana Rosing UC San Diego
Training; Programming model Cambricon-F_ machine learning computers with fractal von neumann architecture
paper note
Yongwei Zhao; Yunji Chen ICT; Cambricon

2018

Tags - Title Authors Affiliations
Training;CNN; RNN A Configurable Cloud-Scale DNN Processor for Real-Time AI
paper note
Jeremy Fowers; Doug Burger Microsoft
Inference; ReRAM PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine- Learning Algorithms
paper note
Prakalp Srivastava; Mingu Kang University of Illinois at Urbana-Champaign; IBM
Inference; Dataflow Computation Reuse in DNNs by Exploiting Input Similarity
paper slides
Marc Riera; Antonio Gonza ?lez Universitat Polite ?cnica de Catalunya
Spiking Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations
paper slides
Dayeol Lee; Jangwoo Kim Seoul National University; University of California
Space-time computing Space-Time Algebra: A Model for Neocortical Computation
paper slides note
James E. Smith University of Wisconsin-Madison
Inference; Cross-module optimization RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
paper note
Fengbin Tu; Shaojun Wei Tsinghua University
Inference;Datapath: bit-serial Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
paper note
Charles Eckert; Reetuparna Das University of Michigan; Intel Corporation
Inference;Cross-module optimization EVA2: Exploiting Temporal Redundancy in Live Computer Vision
paper note slides
Mark Buckler; Adrian Sampson Cornell University
Inference;CNN; Cross-module optimization; Power optimization Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision
paper slides note
Yuhao Zhu; Paul Whatmough University of Rochetster; ARM Research
Inference;GAN; Sparsity; MIMD; SIMD GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks
paper note
Amir Yazdanbakhsh; Hadi Esmaeilzadeh Georgia Institute of Technology; UC San Diego; Qualcomm Technologies
Inference; CNN; Approximate SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks
paper note
Vahideh Akhlaghi; Hadi Esmaeilzadeh Georgia Institute of Technology; UC San Diego; Qualcomm .
Inference;CNN; Sparsity; UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition
paper note
Kartik Hegde; Christopher W. Fletche University of Illinois at Urbana-Champaign; NVIDIA
Inference; Non-uniform Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation
paper note
Eunhyeok Park; Sungjoo Yoo Seoul National University
Inference; Dataflow: Dynamic Prediction Based Execution on Deep Neural Networks
paper note
Mingcong Song; Tao Li University of Flirida
Inference; Datapath: bit-serial Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
paper
Hardik Sharma; Hadi Esmaeilzadeh Georgia Institute of Technology; University of California
Training; memory: bandwith-saving Gist: Efficient Data Encoding for Deep Neural Network Training
paper
Animesh Jain; Gennady Pekhimenko Microsoft Research; University of Toronto; Univerity of Michigan
Inference; Cross-module optimization The Dark Side of DNN Pruning
paper note
Reza Yazdani; Antonio Gonza ?lez Universitat Polite ?cnica de Catalunya

2017

Tags - Title Authors Affiliations
Inference In-Datacenter Performance Analysis of a Tensor Processing Unit
paper
Norman P. Jouppi Google
Inference; Dataflow Maximizing CNN Accelerator Efficiency Through Resource Partitioning
paper
Yongming Shen Stony Brook University
Training SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
paper
Swagath Venkataramani; Anand Raghunathan Purdue University; Parallel Computing Lab; Intel Corporation
Inference; Algorithm-architecture-codesign Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
paper
Jiecao Yu; Scott Mahlke University of Michigan; ARM
Inference; Sparsity SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
paper note
Angshuman Parashar; William J. Dally NVIDIA; MIT; UC-Berkeley; Stanford University
Training; Low-bit Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
paper note
Christopher De Sa; Kunle Olukotun Stanford University

2016

Tags - Title Authors Affiliations
Inference;Sparsity Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
paper note
Jorge Albericio; Tayler Hetheringto University of Toronto; University of British Columbia
Inference; Analog ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
paper note
Ali Shafiee; Vivek Srikumar University of Utah,Hewlett Packard Labs
Inference; PIM PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory
paper note
Ping Chi; Yuan Xie University of California
Inference; Sparsity EIE: Efficient Inference Engine on Compressed Deep Neural Network
paper note
Song Han; William J. Dally Stanford University; NVIDIA
Inference; Analog RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile
paper note
Robert LiKamWa; Lin Zhong Rice University
Inference; Architecture-Physical-Co-design Minerva: Enabling Low-Power; Highly-Accurate Deep Neural Network Accelerators
paper note
Brandon Reagen; David Brooks Harvard University
Inference; Dataflow Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
paper note
Yu-Hsin Chen; Vivienne Sze MIT; NVIDIA
Inference; 3D integration Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory
paper note
Duckhwan Kim; Saibal Mukhopadhyay Georgia Institute of Technology
Inference Cambricon: An Instruction Set Architecture for Neural Networks
paper note
Shaoli Liu; Tianshi Chen CAS; Cambricon Ltd.

2015

Tags - Title Authors Affiliations
Inference; Cross-module optimization ShiDianNao: Shifting Vision Processing Closer to the Sensor
paper note
Zidong Du ICT

ASPLOS

2020

Tags - Title Authors Affiliations
Inference; Security Shredder: Learning Noise Distributions to Protect Inference Privacy
paper note
Fatemehsadat Mireshghallah; Mohammadkazem Taram; et.al. UCSD
Algorithm-Architecture co-design; Security DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks
paper note
Xingbin Wang; Rui Hou; Boyan Zhao; et.al. CAS; USC
programming model; Algorithm-Architecture co-design Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators
paper note
Xuan Yang; Mark Horowitz; et.al. Stanford; THU
Algorithm-Architecture co-design; security DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints
paper note codes
Xing Hu; Yuan Xie; et.al. UCSB
Training; distributed computing Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training
paper note
Qinyi Luo; Jiaao He; Youwei Zhuo; Xuehai Qian USC
compression PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
paper
Wei Niu; Xiaolong Ma; Sheng Lin; et.al. College of William and Mary; Northeastern ; USC
Power optimization; compute-memory trade-off Capuchin: Tensor-based GPU Memory Management for Deep Learning
paper note
Xuan Peng; Xuanhua Shi; Hulin Dai; et.al. HUST; MSRA; USC
Compute-memory trade-off NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
paper
Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; et.al. KAIST
Algorithm-Architecture co-design FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
paper note codes
Size Zheng; Yun Liang; Shuo Wang; et.al. PKU

2019

Tags - Title Authors Affiliations
Inference, ReRAM PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
paper note
Aayush Ankit; Dejan S Milojičić; et.al. Purdue; UIUC; HP
Reinforcement Learning FA3C: FPGA-Accelerated Deep Reinforcement Learning
paper note
Hyungmin Cho; Pyeongseok Oh; Jiyoung Park; et.al. Hongik University; SNU
Inference, ReRAM FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
paper note
Yu Ji; Yuan Xie; et.al. THU; UCSB
Inference, Bit-serial Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks
paper note
Alberto Delmas Lascorz; Andreas Ioannis Moshovos; et.al. Toronto; NVIDIA
Inference, Dataflow TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators
paper note codes
Mingyu Gao; Xuan Yang; Jing Pu; et.al. Stanford
Inference, CNN, Systolic, Sparsity Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
paper codes note
Hsiangtsung Kung;Bradley McDanel; Saiqian Zhang Harvard
Training, CNN, Distributed computing Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization
paper note
Tian Jin; Seokin Hong IBM; Kyungpook National University
Training, Distributed computing HOP: Heterogeneity-Aware Decentralized Training
paper note
Qinyi Luo; Jinkun Lin; Youwei Zhuo; Xuehai Qian USC; THU
Training, Compiler Astra: Exploiting Predictability to Optimize Deep Learning
paper note
Muthian Sivathanu; Tapan Chugh; Sanjay S Singapuram; Lidong Zhou Microsoft
Training, Quantization, Compression ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers
paper note
Ao Ren; Tianyun Zhang; Shaokai Ye; et.al. Northeastern; Syracuse; SUNY; Buffalo; USC
Security DeepSigns: An End-to-End Watermarking Framework for Protecting the Ownership of Deep Neural Networks
paper note
Bita Darvish Rouhani; Huili Chen; Farinaz Koushanfar UCSD

2018

Tags - Title Authors Affiliations
Compiler Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler
paper slides note
Yu Ji; Youhui Zhang; Wenguang Chen; Yuan Xie Tsinghua; UCSB
Inference, Dataflow, NoC MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
paper note slides
Hyoukjun Kwon; Ananda Samajdar; Tushar Krishna Georgia Tech
Bayesian VIBNN: Hardware Acceleration of Bayesian Neural Networks
paper note
Ruizhe Cai; Ao Ren; Ning Liu; et.al. Syracuse University; USC

2017

Tags - Title Authors Affiliations
Dataflow, 3D Integration Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory
paper note
Mingyu Gao; Jing Pu; Xuan Yang Stanford University
CNN; Algorithm-Architecture co-design SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
paper note
Ao Ren; Zhe Li; Caiwen Ding Syracuse University; USC; The City College of New York

2015

Tags - Title Authors Affiliations
Inference PuDianNao: A Polyvalent Machine Learning Accelerator
paper note
Daofu Liu; Tianshi Chen; Shaoli Liu CAS; USTC; Inria

2014

Tags - Title Authors Affiliations
Inference DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
paper note
Tianshi Chen; Zidong Du; Ninghui Sun CAS; Inria

MICRO

2019

Tags - Title Authors Affiliations
compute-memory trade-off; Dataflow Wire-Aware Architecture and Dataflow for CNN Accelerators
paper note
Sumanth Gudaparthi; Surya Narayanan; Rajeev Balasubramonian ; Edouard Giacomin ; Hari Kambalasubramanyam; Pierre-Emmanuel Gaillardon Utah
security; compute-memory trade-off ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning
paper note
Shang-Tse Chen; Cory Cornelius; Jason Martin; Duen Horng Chau Georgia tech; intel
Inference; NoC; Cross-Module optimization Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
paper note slides
Yakun Sophia Shao;Jason Clemons; Rangharajan Venkatesan; et. al. NVIDIA
compression; ISA; Cross-Module optimization ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions
paper note
Berkin Akin; Zeshan A. Chishti; Alaa R. Alameldeen Google; Intel
Algorithm-Architecture co-design Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating
paper note
Weizhe Hua; Yuan Zhou; Christopher De Sa; et.al. Cornell
Sparsity SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks
paper note
Ashish Gondimalla; Noah Chesnu; Noah Chesnu; et.al. Purdue
Power-optimization; Approximate; EDEN: Enabling Approximate DRAM for DNN Inference using Error-Resilient Neural Networks
paper note
Skanda Koppula; Lois Orosa; A. Giray Yağlıkçı; et.al. ETHZ
inference; CNN eCNN: a Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
paper note
Chao-Tsung Huang; Yu-Chun Ding;Huan-Ching Wang; et. al. NTHU
Architecture-Physical co-design TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
paper note
Youngeun Kwon; Yunjae Lee; Minsoo Rhu KAIST
Architecture-Physical co-design; dataflow Understanding Reuse; Performance; and Hardware Cost of DNN Dataflows: A Data-Centric Approach
paper note
Hyoukjun Kwon; Prasanth Chatarasi; Michael Pellauer; et.al. Georgia Tech; NVIDIA
sparsity; inference; MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
paper note
Lillian Pentecost, Marco Donato, Brandon Reagen; et.al. Harvard; Facebook
RNN; Special operation; Neuron-Level Fuzzy Memoization in RNNs
paper note
Franyell Silfa;Gem Dot; Jose-Maria Arnau; et.al. UPC
inference; Algorithm-Architecture co-design; Manna: An Accelerator for Memory-Augmented Neural Networks
paper note
Jacob R. Stevens; Ashish Ranjan; Dipankar Das; et.al. Purdue; Intel
PIM eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing
paper note
Elaheh Sadredini; Reza Rahimi; Vaibhav Verma;et.al. Virginia
Sparsity ExTensor: An Accelerator for Sparse Tensor Algebra
paper note
Kartik Hegde; Hadi Asghari-Moghaddam; Michael Pellauer UIUC; NVIDIA
Sparsity; Algorithm-Architecture co-design Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization
paper note
Fazle Sadi; Joe Sweeney; Tze Meng Low; et.al. CMU
sparsity; Algorithm-Architecture co-design; compression Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs
paper note
Maohua Zhu; Tao Zhang; Tao Zhang; Yuan Xie UCSB; Alibaba
special operation; inference ASV: Accelerated Stereo Vision System
paper note codes1 codes2
Yu Feng; Paul Whatmough; Yuhao Zhu Rochester
Algorithm-Architecture co-design; special operation Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach
paper note
Mingyu Yan;Xing Hu; Shuangchen Li; et.al. UCSB; ICT

2018

Tags - Title Authors Affiliations
Sparsity Cambricon-s: Addressing Irregularity in Sparse Neural Networks: A Cooperative Software/Hardware Approach
paper note
Xuda Zhou ; Zidong Du ; Qi Guo ; Shaoli Liu ; Chengsi Liu ; Chao Wang ; Xuehai Zhou ; Ling Li ; Tianshi Chen ; Yunji Chen USTC; CAS
Inference; CNN; spatial correlation Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator
paper note
Mostafa Mahmoud ; Kevin Siu ; Andreas Moshovos University of Toronto
Distributed computing Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
paper note
Youngeun Kwon; Minsoo Rhu KAIST
RNN Towards Memory Friendly Long-Short Term Memory Networks(LSTMs) on Mobile GPUs
paper note
Xingyao Zhang; Chenhao Xie; Jing Wang; et.al. University of Houston; Capital Normal University
Training, distributed computing, compression A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks
paper note
Youjie Li; Jongse Park; Mohammad Alian; et.al. UIUC; THU; SJTU; Intel; UCSD
Inference, sparsity, compression PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices
paper note
Chunhua Deng; Siyu Liao; Yi Xie; et.al. City University of New York; University of Minnesota; USC
Reinforcement Learning, algorithm-architecture co-design GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
paper note
Ananda Samajdar; Parth Mannan; Kartikay Garg; Tushar Krishna Georgia Tech
Training, PIM Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach
paper note
Jiawen Liu; Hengyu Zhao; et.al. UCM; UCSD; UCSC
GAN, PIM LerGAN: A Zero-free; Low Data Movement and PIM-based GAN Architecture
paper note
Haiyu Mao; Mingcong Song; Tao Li; et.al. THU; University of Florida
Training, special operation, dataflow Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture
paper note
Byungchul Hong; Yeonju Ro; John Kim KAIST
PIM/CIM SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator
paper note
Shuangchen Li; Alvin Oliver Glova; Xing Hu; et.al. UCSB; Samsung
Inference, algorithm-architecture co-design Morph: Flexible Acceleration for 3D CNN-based Video Understanding
paper note
Kartik Hegde; Rohit Agrawal; Yulun Yao; Christopher W Fletcher UIUC

2017

Tags - Title Authors Affiliations
Bit-serial Bit-Pragmatic Deep Neural Network Computing
paper note
Jorge Albericio; Alberto Delmás; Patrick Judd; et.al. NVIDIA; University of Toronto
CNN, Special computing CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
paper note
Caiwen Ding; Siyu Liao; Yanzhi Wang; et.al. Syracuse University; City University of New York; USC; California State University; Northeastern University
PIM DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
paper note
Shuangchen Li; Dimin Niu; et.al. UCSB; Samsung
Distributed computing Scale-Out Acceleration for Machine Learning
paper note
Jongse Park; Hardik Sharma; Divya Mahajan; et.al. Georgia Tech; UCSD
DNN, Sparsity, Bandwidth saving DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission
paper note
Parker Hill; Animesh Jain; Mason Hill; et.al. Univ. of Michigan; Univ. of Nevada

2016

Tags - Title Authors Affiliations
DNN, compiler, Dataflow From High-Level Deep Neural Models to FPGAs
paper note
Hardik Sharma; Jongse Park; Divya Mahajan; et.al. Georgia Institute of Technology; Intel
DNN, Runtime, training vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
paper note
Minsoo Rhu; Natalia Gimelshei; Jason Clemons; et.al. NVIDIA
Bit-serial Stripes: Bit-Serial Deep Neural Network Computing
paper note
Patrick Judd; Jorge Albericio; Tayler Hetherington; et.al. University of Toronto; University of British Columbia
Sparsity Cambricon-X: An Accelerator for Sparse Neural Networks
paper note
Shijin Zhang; Zidong Du; Lei Zhang; et.al. Chinese Academy of Sciences
Neuromorphic, Spiking, programming model NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints
paper note
Yu Ji; YouHui Zhang; ShuangChen Li; et.al. Tsinghua University; UCSB
Cross Module optimization Fused-Layer CNN Accelerators
paper note
Manoj Alwani; Han Chen; Michael Ferdman; Peter Milder Stony Brook University
power optimization, cross module optimization A Patch Memory System For Image Processing and Computer Vision
paper note
Jason Clemons; Chih-Chi Cheng; Iuri Frosio; Daniel Johnson; Stephen W. Keckler NVIDIA; Qualcomm
power optimization An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
paper note
Reza Yazdani; Albert Segura; Jose-Maria Arnau; Antonio Gonzalez Universitat Politecnica de Catalunya

2014

Tags - Title Authors Affiliations
Inference, CNN DaDianNao: A Machine-Learning Supercomputer
paper note
Yunji Chen; Tao Luo; Shaoli Liu; et.al. CAS; Inria; Inner Mongolia University

HPCA

2020

Tags - Title Authors Affiliations
ReRam Deep Learning Acceleration with Neuron-to-Memory Transformation
Paper note
Mohsen Imani; Mohammad Samragh Razlighi; Yeseong Kim; et.al. UCSD
graph network HyGCN: A GCN Accelerator with Hybrid Architecture
Paper note
Mingyu Yan; Lei Deng; Xing Hu; et.al. ICT; UCSB
training; sparsity SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
Paper note Slides
Eric Qin; Ananda Samajdar; Hyoukjun Kwon; et.al. Georgia Tech
Programming model; DNN PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs
Paper note
Yujeong Choi; Minsoo Rhu KAIST
sparsity; compute-memory trade-off ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
Paper note
Bahar Asgari; Ramyad Hadidi; Tushar Krishna; et.al. Georgia Tech
sparsity;Algorithm-Architecture co-design SpArch: Efficient Architecture for Sparse Matrix Multiplication
Paper note Project
Zhekai Zhang; Hanrui Wan; Song Han ; William J. Dally MIT; NVIDIA
Algorithm-Architecture co-design; Approximation A3: Accelerating Attention Mechanisms in Neural Networks with Approximation
Paper note
Tae Jun Ham; Sung Jun Jung; Seonghak Kim; et.al. SNU
training; Architecture-Physical co-design AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays
Paper note
Linghao Song; Fan Chen; Youwei Zhuo; et.al. Duke; USC
Special operation, architecture-physical co-design PIXEL: Photonic Neural Network Accelerator
Paper note
Kyle Shiflett; Dylan Wright; Avinash Karanth; Ahmed Louri Ohio; George Washington
Capasule; PIM Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
Paper note
Xingyao Zhang; Shuaiwen Leon Song; Chenhao Xie; et.al. Houston
Bandwidth saving Communication Lower Bound in Convolution Accelerators
Paper note
Xiaoming Chen; Yinhe Han; Yu Wang ICT; THU
Training, Distributed computing; algorithm-architecture co-design EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
Paper note
Jianbo Dong; Zheng Cao; Tao Zhang; et.al. Alibaba
NoC; Experiences with ML-Driven Design: A NoC Case Study
Paper note
Jieming Yin; Subhash Sethumurugan; Yasuko Eckert; et.al. AMD
sparsity Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations
Paper note
Nitish Srivastava; Hanchen Jin; Shaden Smith; et.al. Cornell; Intel
algorithm-architecture co-design A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms
Paper note
Jian Weng; Sihao Liu; Zhengrong Wang; et.al. UCLA
Reinforcement Learning; NoC; algorithm-architecture co-design A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study
Paper note
Ting-Ru Lin; Drew Penney; Massoud Pedram; Lizhong Chen USC; OSU
power optimization Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices
Paper note
Jawad Haj-Yahya; Yanos Sazeides; Mohammed Alser; et.al. ETHZ; Cyprus; CMU

2019

Tags - Title Authors Affiliations
training; compute-memory trade-off HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
paper note
Linghao Song; Jiachen Mao; Yiran Chen; et.al. Duke; USC
RNN; algorithm-architecture co-design E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
paper note
Zhe Li; Caiwen Ding; Siyue Wang Syracuse University; Northeastern University; Florida International University; USC; University at Buffalo
CNN, Bit-serial, Sparsity Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
paper note
Xiaowei Wang; Jiecao Yu; Charles Augustine; et.al. Michigan; Intel
cross-Module optimization Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators
paper note
Arash Azizimazreah; Lizhong Chen OSU
PIM/CIM, low-bit, binary NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks
paper note
Hyeonuk Kim; Jaehyeong Sim; Yeongjae Choi; Lee-Sup Kim KAIST
Accuracy-Latency trade-off Kelp: QoS for Accelerators in Machine Learning Platforms
paper note
Haishan Zhu; David Lo; Liqun Cheng Microsoft; Google; UT Austin
inference Machine Learning at Facebook: Understanding Inference at the Edge
paper note
Carole-Jean Wu; David Brooks; Kevin Chen; et.al. Facebook
Architecture-Physical co-design The Accelerator Wall: Limits of Chip Specialization
paper note codes
Adi Fuchs; David Wentzlaff Princeton

2018

Tags - Title Authors Affiliations
special operation; approximate Making Memristive Neural Network Accelerators Reliable
paper note
Ben Feinberg; Shibo Wang; Engin Ipek University of Rochester
Algorithm-Architecture co-design; GAN Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning
paper
Mingcong Song; Jiaqi Zhang; Huixiang Chen; Tao Li University of Florida
compression; sparsity Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
paper note
Minsoo Rhu; Mike O'Connor; Niladrish Chatterjee; et.al. POSTECH; NVIDIA; UT-Austin
architecture-psychical co-design; inference In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems
paper note
Mingcong Song; Kan Zhong; Tao li; et.a. University of Florida; Chongqing University; Capital Normal University
Special operation; ReRam GraphR: Accelerating Graph Processing Using ReRAM
paper note
Linghao Song; Youwei Zhuo; Xuehai Qian Duke; USC;
pim; Special operation; datafow GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition
paper note
Mingxing Zhang; Youwei Zhuo; Chao Wang; et.al. THU; USC; Stanford
Power optimization; PIM PM3: Power Modeling and Power Management for Processing-in-Memory
paper note
Chao Zhang; Tong Meng; Guangyu Sun PKU

2017

Tags - Title Authors Affiliations
Inference, CNN, Dataflow FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
paper note
Wenyan Lu; Guihai Yan; Jiajun Li; et.al. Chinese Academy of Sciences
Inference, ReRAM PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
paper note
Linghao Song; Xuehai Qian; Hai Li; Yiran Chen University of Pittsburgh; University of Southern California
Training Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
paper
Mingcong Song; Yang Hu; Huixiang Chen; Tao Li University of Florida

2016

Tags - Title Authors Affiliations
Programming model, training TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning
paper note
Divya Mahajan; Jongse Park; Emmanuel Amaro Georgia Institute of Technology
ReRam; Boltzmann Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
paper note
Mahdi Nazm Bojnordi; Engin Ipek University of Rochester