
Use CUDA to accelerate the operations of a typical convolutional layer in often-used.

Primary LanguageCuda

Computer Architecture Final Project

Part-I: Use CUDA to accelerate the operations of a typical convolutional layer in often-used.
(You can find the description slides here)

Three sub-directory


This directory contains the input data for the base program

  • /data/filt.txt - Store the values of filters
  • /data/inNeu.txt - Store the values of input neurons


This is the example to show you how to use CUDA to accelerate Inner Product


cd ./innerProduct
make run


The program under this directory can show the device information


cd ./device
make run

Usage of the base program

Compile the code


Run the code

make run


  • Implement convLayerGPU() with CUDA
  • Store your result in the outGPU
  • Use NVIDIA Visual Profiler to analyze and improve your code


  • convLayerCPU() will do the computation with C++ and store the output in the outCPU

  • checker() will check whether the values stored in outCPU and outGPU are the same

  • clock_gettime() is used to measure your preformance

  • Lunch your CUDA kernels within two clock_gettime() functions (You are allowed to lunch multiple kernels in this project)

  • Put cudaDeviceSynchronize() before the last clock_gettime()

  • You must pass the checking to ensure your result is correct!

  • We will compare the execution time to get the speedup

      Speedup = convLayerCPU_execTime / convLayerGPU_execTime

Grading Policy

  • Completeness (30%)
    • Your result must be correct (Pass the check) (10%)
    • You get speedup compared to convLayerCPU() (10%)
    • You use NVIDIA Visual Profiler (NVVP) to help you (10%)
  • Report (40%)
    • Describe your implementation algorithm and explain your results (10%)
    • Show how you use NVVP to help you find and solve perf. issues (10%)
    • Discussions on the optimizations you do (10%)
    • Feedback of this project (10%)
  • Performance Rank (30%)
    • We will rank your CUDA kernels’ performance on GTX 680
    • The fastest one will get 30 points and the last one will get 1 points for this part
  • Delay is not acceptable!

Other Rules

  • It’s team work, 1 ~ 3 people in one team
    • Register here before deadline
  • Compress your code and report into one zip file and upload to E3
    • Name your package as: LeaderID_FP1.zip
    • One team only need to upload one package to E3
    • Please name your report as: LeaderID_Report_FP1.pdf
    • Make sure TA can compile and run your code with "make" and "make run" on the provided server
  • Any CUDA library is forbidden to use in this project

Useful Reference

TA: Chien-Yu Lin
Email: myislin@gmail.com