MASTER : NCTU IEE 2016 Fall
Computer Architecture Final Project

Part-I: Use CUDA to accelerate the operations of a typical convolutional layer in often-used.
(You can find the description slides here)

Three sub-directory

This directory contains the input data for the base program

This is the example to show you how to use CUDA to accelerate Inner Product

cd ./innerProduct
make
make run

The program under this directory can show the device information

cd ./device
make
make run

make

make run

convLayerCPU() will do the computation with C++ and store the output in the outCPU
checker() will check whether the values stored in outCPU and outGPU are the same
clock_gettime() is used to measure your preformance
Lunch your CUDA kernels within two clock_gettime() functions (You are allowed to lunch multiple kernels in this project)
Put cudaDeviceSynchronize() before the last clock_gettime()
You must pass the checking to ensure your result is correct!

We will compare the execution time to get the speedup

  Speedup = convLayerCPU_execTime / convLayerGPU_execTime

Completeness (30%)
- Your result must be correct (Pass the check) (10%)
- You get speedup compared to convLayerCPU() (10%)
- You use NVIDIA Visual Profiler (NVVP) to help you (10%)
Report (40%)
- Describe your implementation algorithm and explain your results (10%)
- Show how you use NVVP to help you find and solve perf. issues (10%)
- Discussions on the optimizations you do (10%)
- Feedback of this project (10%)
Performance Rank (30%)
- We will rank your CUDA kernels’ performance on GTX 680
- The fastest one will get 30 points and the last one will get 1 points for this part
Delay is not acceptable!

It’s team work, 1 ~ 3 people in one team
- Register here before deadline
Compress your code and report into one zip file and upload to E3
- Name your package as: LeaderID_FP1.zip
- One team only need to upload one package to E3
- Please name your report as: LeaderID_Report_FP1.pdf
- Make sure TA can compile and run your code with "make" and "make run" on the provided server
Any CUDA library is forbidden to use in this project

TA: Chien-Yu Lin
Email: myislin@gmail.com