This is the official manual for your final project.
Please follow the guidelines below.
Keep your eyes on updates as there may be some changes in specification / scoring policy in future.
- No updates yet
Due for Baseline : 8th June
Due for Opimizations : 11st June
We will not accept any delayed submission.
We suggest three different ways to optimize your work: Quantization, Zero-Skipping, and DMA(Direct Memory Access).
Relevant materials will be uploaded on eTL soon.
Videos | Files | |
---|---|---|
Baseline | ||
Quantization | ||
Zero-skpping | ||
DMA |
You need a bitstream file that you have generated with the block design that includes your IP.
You just have to replace the custom IP of the block design in lab10 with your MM(Matrix-Matrix) PE controller.
How the PE controller should be designed is explained here.
Once you are prepared with the bistream file, rename it to "zynq.bit", and move it to the sdcard.
Insert the sdcard to the device and boot it.
How you can boot your device via minicom is explained here.
※ This is optional since the source files are totally same as in lab09, except benchmark.sh.
You can therefore skip 3~6 and extend your work on lab09.
You need to download this repository to start your final project.
$ git clone https://github.com/resurgo97/hsd22_project
Note that this command can be run on the terminal on your device if connected to the network.
Check if all the dependencies for running the codes have been installed.
$ sudo apt-get update -y
$ sudo apt-get install -y libprotobuf-dev protobuf-compiler python python-numpy
These would have already been installed on your device if you have successfully done your lab09.
Run the command below to download the dataset.
$ bash download.sh
You will see three functions (LargeMV & LargeMM & ConvLowering) that have not been implemented in the fpga_api.cpp & fpga_api_cpu.cpp.
Complete the codes for those functions as you did it in the lab09.
Modify fpga_api.cpp & fpga_api_cpu.cpp based on your previous works.
Run the validation code as below.
sh benchmark.sh
Hopefully you will get 100% accuracy on the classfication task!
- Accuracy on the classification task with CNN should be 100%.
(Acceptable amount of degradation by quantization or zero-skipping will be allowed.) - The PE controller should consist of (at most) 8x8 (=64) PEs.
- The FSM should consist of 5 states: IDLE - LOAD - CALC - HARV - DONE
- During HARV(harvest) state, the PE controller should write back the computed data to BRAM.
You are not bound to this approach for optimizing baseline. That means, you can also exploit pipelining. - Latency of your floating point MAC must be set as 16 cycles.
Well explained in the videos.
- Baseline 70% + Optimizations 30% (10% for each)
- For each, Implementation 70% + Performance 30%
- Implementation
- If you fail to implement or lose accuracy due to logical error in your code, you may not get the whole points.
- We accept small amount of accuracy loss by zero-skipping or quantization.
- Performance
- Total computation time spent by HW for baseline, quantization, zero-skipping
- Total data transfer time for DMA
- You will be given some code snippets to estimate computation latency of your work
- Report
Please use the Q&A board on eTL if you have questions or want more information about the project.