/Tiled-Naive-2D-Convolution-1-D-Histogram-CUDA-OpenCL

2D convolution and 1D histogram calculation was performed in both CUDA and OpenCL. 2D convolution was implemented, taking advantage of both shared memory/tiles and global memory (naive methods). Tiled 2D convolution was performed in CUDA only. For naive 2D convolution, the input to the algorithm is an [M X N] matrix and a [K X K] kernel of odd dimension sizes. The output after convolution remained the same size as the input; zero padding was performed to take into account halo/ghost cells, when the kernel was ”acting” on ”non-existent” pixels/matrix elements. For the tiled 2D convolution, the kernel size was fixed: [5 x 5] to avoid dealing with dynamic memory allocation. For both methods, a serial implementation of 2D convolution was performed using scipy function (signal.convolve2D). Execution times for 2D convolution CUDA naive, 2D convolution CUDA tiled, and 2D convolution serial were recorded and plotted for comparison. Execution times for 2D convolution in OpenCL were compared to 2D convolution in serial and plotted as well. Matrices for both CUDA and OpenCL were initialized and iteratively increased in size in both dimensions by the same factor. Similarly, for 1D histogram calculation, execution times were recorded in both OpenCL and CUDA, and serial code, and then plotted and compared. One plot was created for CUDA versus serial. A second plot was created for OpenCL versus serial.

Primary LanguagePython

Watchers