Assignment Bonus: GPU CUDA Test

PPCA 2019 machine learning system Bonus - CUDA

In this assignment, we would implement some GPU kernel for ML System.

Key concepts and data structures that we would need to implement are

Overview of Module

tests/dlsys/autodiff.py: Implements computation graph, autodiff, GPU/Numpy Executor.
tests/dlsys/gpu_op.py: Exposes Python function to call GPU kernels via ctypes.
tests/dlsys/ndarray.py: Exposes Python GPU array API.
src/dlarray.h: header for GPU array.
src/c_runtime_api.h: C API header for GPU array and GPU kernels.
src/gpu_op.cu: cuda implementation of kernels

Understand the code skeleton and tests. Fill in implementation wherever marked """TODO: Your code here""".

There are only one file with TODOs for you.

Do not change Makefile to use cuDNN for GPU kernels. Also, cublas is forbidden for matrix multiply.

You need to install CUDA toolkit (instructions) on your own machine, and set the environment variables.
```
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
```
Workstation in the lab is equipped with CUDA, you can use it directly.
MacBook (Pro) is not equipped with NVIDIA GPU, so mac users need coding with the WorkStation in the lab.

We have 12 tests in tests/test_gpu_op.py. We would grade your GPU kernel implementations based on those tests.

Compile

make

Run all tests with

# sudo pip install nose
nosetests -v tests/test_gpu_op.py

If your implementation is correct, you would see

Profile GPU execution with

nvprof nosetests -v tests/test_gpu_op.py