/CUDA-In-place-Sum-Reduction-No-Divergence

This C++/CUDA C program performs in-place sum reduction on a floating-point vector of any size provided by the user. The vector is initialized to random values by the host. The parallel version of the program uses multiple thread blocks and shared memory. My program invokes the kernel multiple times.

Primary LanguageCuda

No issues in this repository yet.