GPGPU Example with Apple's Metal API
Table View with available kernels to compare CPU and GPU performance
Performs CPU and GPU computations. Shows execution times.
Kernels that are executed on GPU with Metal API.
Simple map that applies cosine function to each element of input array.
Naive parallel reduction (computes sum of cosine of each input array element).
Changed threads performing reduction.
Accessing connected memory space.
The same as in reduce3 but first reduction step is performed when copying data to shared memory, so we need half the number of threads that we needed in the previous reduce versions.
-
Graphics presenting reduction optimization steps source: Optimizing Parallel Reduction in CUDA by Mark Harris https://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf
-
This example only works for input array which size is a positive integer power of 2. As a simple exercise, you can try to make it more flexible.