amirjalili/CUDA_Tiled_2D_Convolution
Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.
Cuda