PipeCNN

About

PipeCNN is an OpenCL-based FPGA Accelerator for Large-Scale Convolutinal Neural Networks (CNNs). There is a growing trend among the FPGA community to utilize High Level Synthesis (HLS) tools to design and implement customized circuits on FPGAs. Compared with RTL-based design methodology, the HLS tools provide faster hardware development cycle by automatically synthesizing an algorithm in high-level languages (e.g. C/C++) to RTL/hardware. OpenCL™ is an open, emergying cross-platform parallel programming language that can be used in both GPU and FPGA developments. The main goal of this project is to provide a generic, yet efficient OpenCL-based design of CNN accelerator on FPGAs. Our design is scalable both in performance and hardware resource, and thus can be deployed on a variety of FPGA platforms.

How to Use

First, download the pre-trained CNN models, input test vectors and golden reference files from PipeCNN's own ModelZoo. Place the data in the correct folder. Compile the project by using the Makefile provided. After finishing the compilation, simply type the following command to run PipeCNN:

./run.exe conv.aocx

For users who are using Xilinx's SDx environments, it is recommended to use the IDE instead of makefiles. For more detailed user instructions, please refer to the docs.

Boards and Performances

Currently, we use Intel's OpenCL SDK v16.1 toolset for compilation of the OpenCL code and implementation of the generated RTL on Altera's FPGAs. For Xilinx FPGAs, the SDAccel and SDSoc development environments v2017.1 are used. PipeCNN has been tested on the following FPGA boards/platforms, and the achieved performances are reported in Table-I. We welcome other vendors/researches to provide performance and cost information on other FPGA platforms/boards.

Terasic's DE5-net (Stratix-IV A7 FPGA)
Terasic's DE5a-net (Arria-10 1150 FPGA)
Terasic's DE1-soc (Cyclone-V SEA5 FPGA)
Terasic's DE10-standard (Cyclone-V SXC6 FPGA)
AlphaData's ADM-PCIE-7V3 (Virtex-7 690T FPGA)

Table-I. Performance Measured and Hardware Resource Concumed

Platform	Performance	Speed	CNN Model	DSP Consumed	Configuration
Stratix-V A7	--	--	AlexNet	--	--
Arria-10 1150	--	--	AlexNet	--	--
Cyclone-V SEA5	9.24GOPS	6.6fps	AlexNet	68	V=8,L=8,GP_X=7
Virtex-7 690T	--	--	AlexNet	--	--

Note: parameters V, L, GP_X refers to VEC_SIZE, LANE_NUM, and CONV_GP_SIZE_X, respectively

Update Plans

Implementation of Faster-RCNN (end of August)
Optimization for DE5a-net (Arria-10) targeting 500 fps of AlexNet (end of August)
Support for sparse or Winograd-based convolution algorithms

Citation

Please kindly cite our work of PipeCNN if it helps your research:

Dong Wang, Jiangjing An and Ke Xu, “PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks”, https://arxiv.org/abs/1611.02450, 2016.

Related Works

There are other FPGA accelerators that also adopt HLS-based design scheme. Some brilliant works are listed as follow. Note that PipeCNN is the first, and only one that is Open-Source （￣︶￣）↗

U. Aydonat, S. O'Connell, D. Capalija, A. C. Ling, and G. R. Chiu. "An OpenCL™ Deep Learning Accelerator on Arria 10," in Proc. FPGA 2017.
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. F. Ma, S. Vrudhula, J. S. Seo, and Y. Cao, "Throughput-Optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks," in Proc. FPGA 2016.
C. Zhang, P. Li, G. Sun, Y. Guan, B. J. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in Proc. FPGA 2015.