CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
Primary LanguageCuda