This repo has experiments and analysis for implementing various kernels for the carbonara library from the Gnocchi project.
The repo is organized into various branches.
This branch has the version v1 and v2
this branch is for version v3 and also the final implementation
This was intermediate improvement over v3 but the speed-up was not great
This implements the kernel using stream, a more advanced approach
Another approach where work is launched from different threads, but overhead is too high
For a more detailed guide have a look at the https://sjamgade.github.io/carbs