Pinned Repositories
CUDA_Data_Parallel_Reduction
Implementation of a work-efficient parallel reduction algorithm on the GPU with accumulation using atomic additions
CUDA_Parallel_Scan_prefix_sum
Implementation of a work-efficient parallel prefix-sum algorithm on the GPU.
CUDA_Tiled_2D_Convolution
Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.
CUDA_Tiled_Matrix_Multiplication
TILED Matrix Multiplication in CUDA by utilizing the lower latency, higher bandwidth shared memory within GPU thread blocks.
oberan
webflow-cost-calculator-CDN
amirjalili's Repositories
amirjalili/CUDA_Parallel_Scan_prefix_sum
Implementation of a work-efficient parallel prefix-sum algorithm on the GPU.
amirjalili/CUDA_Tiled_2D_Convolution
Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.
amirjalili/CUDA_Data_Parallel_Reduction
Implementation of a work-efficient parallel reduction algorithm on the GPU with accumulation using atomic additions
amirjalili/CUDA_Tiled_Matrix_Multiplication
TILED Matrix Multiplication in CUDA by utilizing the lower latency, higher bandwidth shared memory within GPU thread blocks.
amirjalili/oberan
amirjalili/webflow-cost-calculator-CDN