LeNet-5 on ZYBO Z7-10 FPGA

Details are organized in LeNet-5.pdf.

Directories

python

data

cplusplus

Implemented according to each model and data type.
origin : Original model with floating point data type.
floating-point : Light model with floating point data type.
fixed-point : Light model with fixed point data type.
hls-stream : Light model with hls::stream data type.
hls-parallel : Partially parallel model with hls::stream data type.
etc : Analyze result, save input txt file as a binary, determine number of fixed point interger part, or generate weight array.

hls

dtype : Compare floating point and fixed point in terms of latency, resource, max frequency through simple function.
predict : Partially parallel model that can be synthesized with HLS, that the top function name is predict.

arm

SW driver code that read input, check accuracy, measure latency.
Read input binary file through SD card.
Run 10,000 cases to measure accuracy.
Use the AXI Timer IP on the PL (Programmable Logic) side to measure the latency.

etc

waveform : Waveform obtained through Integrated Logic Analyzer (ILA).
BlockDesign.pdf : Block design of programmable logic.
LeNet-5.pdf : Presentation pdf file that summarizes the content of this LeNet-5 project.

python

Tensorflow (2.10.0) must be installed in advance.
Since the extension of the file is *.ipynb, if you use vscode, it is recommended to install Jupyter extension.
If you run all the *.ipynb files, /data folder will be generated.

cplusplus

The AXI DMA provides high-speed data movement between system memory and an AXI4-Stream based target IP.
The Integrated Logic Analyzer (ILA) IP core is logic analyzer core that can be used to monitor the internal signals of a design.
The AXI Timer provides an AXI4-Lite interface to communicate with the PS (Processing System).
Interrupt is driven when the ap_done block level interface signal is active High.

Initialize AXI DMA IP and predict IP through dmaInit() and predictInit().
Flush the cache before transferring data via DMA through cacheFlush().
AXI DMA IP reads data through DDR and transfers it to predict IP through dataTx().
Wait for the preidct IP to process, and read the result when interrupt signal is raised.

Predict IP on PL is 40.34x faster than Original with -O0 compile option on PS.
Predict IP on PL is 7.38x faster than Original with -O2 compile option on PS.
Predict IP on PL is 16.01x faster than Lite with -O0 compile option on PS.
Predict IP on PL is 1.57x faster than Lite with -O3 compile option on PS.