amirjalili

Ph.D student in Computer Science

University of MinnesotaMinneapolis

Pinned Repositories

CUDA_Data_Parallel_Reduction
Implementation of a work-efficient parallel reduction algorithm on the GPU with accumulation using atomic additions
Language:Cuda0 2 00
CUDA_Parallel_Scan_prefix_sum
Implementation of a work-efficient parallel prefix-sum algorithm on the GPU.
Language:Cuda3 2 00
CUDA_Tiled_2D_Convolution
Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.
Language:Cuda1 2 00
CUDA_Tiled_Matrix_Multiplication
TILED Matrix Multiplication in CUDA by utilizing the lower latency, higher bandwidth shared memory within GPU thread blocks.
Language:Cuda0 2 00
oberan
Language:HTML0 2 00
webflow-cost-calculator-CDN
Language:JavaScript0 2 00

amirjalili/CUDA_Parallel_Scan_prefix_sum
Implementation of a work-efficient parallel prefix-sum algorithm on the GPU.
Language:Cuda3 2 00
amirjalili/CUDA_Tiled_2D_Convolution
Tiled implementation of a 2D matrix convolution by utilizing the shared and global constant memory within GPU thread blocks to minimize the memory bandwidth bottleneck and achieve a higher performance speedup.
Language:Cuda1 2 00
amirjalili/CUDA_Data_Parallel_Reduction
Implementation of a work-efficient parallel reduction algorithm on the GPU with accumulation using atomic additions
Language:Cuda0 2 00
amirjalili/CUDA_Tiled_Matrix_Multiplication
TILED Matrix Multiplication in CUDA by utilizing the lower latency, higher bandwidth shared memory within GPU thread blocks.
Language:Cuda0 2 00
amirjalili/oberan
Language:HTML0 2 00
amirjalili/webflow-cost-calculator-CDN
Language:JavaScript0 2 00