dirac-institute/kbmod

Templated classes for GPU and CPU implementation

Opened this issue · 0 comments

The implementation proposed in #337 requires us to make a decision on which of the methods we want to use - the GPU or the CPU implementation. This leads to branching statements everywhere in Python and IFDEFS preprocessors in the implementation.

I don't like it and I think it'll make it hard to move forward with. We will either have to wrap every CPP class that uses these methods to make the appropriate decission in Python internally for users to use. This also leads to two sets of identical unittests, the clear if-else statement in which is just hidden behind a bit of @unittest.skipIf decorator. Such wrappers don't really provide transparency to the user and force anyone that just wants to use the CPP code to IFDEF everything they do. I'm also not sure if we can simultaneously test both code-paths using IFDEFS, a feature noted to be desirable.

Here's an example proposal. I've seen C++ >= 20 has a lot more of these cool generics that help us, specifically in this case requires - but just so we stay in C++ 17 land here's a proposal we could adopt:

#include<iostream>


constexpr bool HAS_GPU = true;


template<typename T, bool on_gpu = HAS_GPU>
class Image {

public:
  void convolve(T a, T b)  {  convolve_impl(a, b); }

private:
  template<bool enabled = on_gpu>
  void convolve_impl(typename std::enable_if<enabled, T>::type a, T b)
  {
    std::cout << "GPU enabled! Result: " << a*b << std::endl;
  }

  template<bool enabled = on_gpu>
  void convolve_impl(typename std::enable_if<!enabled, T>::type a, T b)
  {
    std::cout << "GPU disabled! Result: " << a*b << std::endl;
  }
};

int main()
{
  // true or false here is just showcase, use HAS_GPU
  Image<double, true> a1;
  Image<int, false> a2;

  a1.convolve(5.1, 1.0);
  a2.convolve(5.1, 1.0);

  return 0;
}

The way it works is that we define two implementations internally. Same interface is proposed for the public. A constexpr constant (has or doesn't have a GPU) is evaluated at compile time and then we let the compiler unravel and compile the implementation that matches the desired one, depending on the presence of the GPU or not

The way it matches is that enable_if returns the type T when enabled (alias for on_gpu because variable shadowing is not allowed) and "nothing" (I'm guessing void internally). Nothing can use nothing type so that branch is never compiled when enabled isn't true, resulting in same code to that of IFDEFs and wrappers with no ifs or branching. Note that the two private convolve_impl function implementations use enabled and !enabled for that reason.

In the example I added multiple parameters to the function, but it only requires the nastiness on one of them to make the distinction between the function signatures work. The only new cost is that usage of Image has to have HAS_GPU on it.

Output of the example is:

GPU enabled! Result: 5.1
GPU disabled! Result: 5

showing that two different codepaths were taken.

Note that example shows that we can still require both codepaths to be compiled on the same machine at the same time. Using Pybind, and referencing this as a possible approach to bindings we would just have to have something like:

// the default path builds one "Image" class with the appropriate implementation
// while the other path binds the GPU to the "Image" class and CPU to the "CpuImage" class via Pybind
// build will never reach this point if users erroneously set BUILD_BOTH because check_language(CUDA) will 
// fail (somewhat) gracefully at the start of CMAKE
#ifndef BUILD_BOTH 
model::image_bindings_factory<double>(m, "", HAS_GPU);
#else 
model::image_bindings_factory<double>(m, "", HAS_GPU);
model::image_bindings_factory<double>(m, "Cpu", !HAS_GPU);
#endif

Optionally I guess, set HAS_GPU to false and build an Image with the CPU bindings and rerun the tests. This is a single pip install command.

Concerns about maintainability in the future were expressed so this was left as an open issue.