amirjalili/CUDA_Tiled_2D_Convolution

Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.

Cuda

Stargazers

sumagic
Huawei