std::complex support
PhilipVinc opened this issue · 9 comments
Hello,
Judging from the source code of VexCL there is no support for complex numbers. I've got some Stocastic Differential Equation (Monte Carlo) solvers implemented in C++ that I would like to see how it performs on OpenCL.
How hard would it be to add support for complex and eventually complex to VexCL? I could work on it in my spare time and would eventually be available to contribute a PR, but I would need some guidance while delving inside VexCL, if you would be available to give some implementation hints.
I'll write about the OpenCL backend. Same should be applicable to the CUDA and OpenMP (JIT) backends. In fact, with both of the latter you could probably just #include <complex>
in the kernel header and work with std::complex<T>
directly.
OpenCL does not have native support for complex numbers, but you could use cl_double2
to model it. cl_double2
is binary compatible with std::complex<double>
and so transfers between the host and the device memories are straightforward. The main difference is that multiplication and division operations are element-wise for OpenCL vector types like cl_double2
. So you would have to introduce custom functions to do those. Here is a simple example:
#include <iostream>
#include <complex>
#include <vector>
#include <vexcl/vexcl.hpp>
//---------------------------------------------------------------------------
// Complex multiplication:
//---------------------------------------------------------------------------
VEX_FUNCTION(cl_double2, cmul, (cl_double2, a)(cl_double2, b),
double2 r = {a.x * b.x - a.y * b.y, a.x * b.y + a.y * b.x};
return r;
);
//---------------------------------------------------------------------------
// Complex division:
//---------------------------------------------------------------------------
VEX_FUNCTION(cl_double2, cdiv, (cl_double2, a)(cl_double2, b),
double d = b.x * b.x + b.y * b.y;
double2 r = {(a.x * b.x + a.y * b.y) / d, (a.y * b.x - a.x * b.y) / d};
return r;
);
int main() {
vex::Context ctx(vex::Filter::Env);
std::cout << ctx << std::endl;
const int n = 16;
// Host side input vectors:
std::vector<std::complex<double>> x(n), y(n);
for(int i = 0; i < n; ++i) {
x[i] = {i, n-i};
y[i] = {n-i, i};
}
// Transfer host-side complex numbers into device-side cl_double2 vectors:
vex::vector<cl_double2> X(ctx, n, reinterpret_cast<cl_double2*>(x.data()));
vex::vector<cl_double2> Y(ctx, n, reinterpret_cast<cl_double2*>(y.data()));
// Do device-side multiplication and division:
vex::vector<cl_double2> T(ctx, n);
T = cmul(X, Y);
std::cout << "X * Y = " << T << std::endl;
T = cdiv(X, Y);
std::cout << "X / Y = " << T << std::endl;
}
You could also introduce other helper functions, like get_real(c)
, get_imag(c)
, or make_complex(x,y)
. Would this work for your needs?
Thanks a lot.
This indeed gives me an idea, but I need support for complex sparsematrix-vector multiplication as well as complex FFT.
From the documentation and the source code in vexcl/fft/kernels.hpp
I see that you support complex FFT, but how would I go about supporting sparsematrix-vector multiplication?
There is an example of complex spmv implementation in amgcl/backend/vexcl_complex.hpp. This relies on AMD OpenCL 1.2, which supports C++ in kernels, and so is not portable. There is also a portable, pure C implementation of block-valued spmv at amgcl/backend/vexcl_static_matrix.hpp. This specializes vex::sparse::ell<> class template for small static matrices as value type. You could probably use either of the examples as a starting point for complex spmv.
The following should be enough for complex SpMV in vexcl:
#include <iostream>
#include <vector>
#include <complex>
#include <vexcl/vexcl.hpp>
// The following enables use of std::complex<T> in vexcl expressions.
// It simply translates std::complex<T> on the host side into opencl vector
// types (float2/double2) on the device side.
//
// However, semantics of operations like multiplication, division, or
// math functions will be wrong. This is the main reason I hesitate doing
// this in the core library.
namespace vex {
template <typename T>
struct is_cl_native< std::complex<T> > : std::true_type {};
template <typename T>
struct type_name_impl< std::complex<T> >
{
static std::string get() {
std::ostringstream s;
s << type_name<T>() << "2";
return s.str();
}
};
template <typename T>
struct cl_scalar_of< std::complex<T> > {
typedef T type;
};
} // namespace vex
// Now we specialize a template from <vexcl/sparse/spmv_ops.hpp> that allows
// vexcl to generate semantically correct smpv kernels for complex values:
namespace vex {
namespace sparse {
template <typename T>
struct spmv_ops_impl<std::complex<T>, std::complex<T>>
{
static void decl_accum_var(backend::source_generator &src, const std::string &name)
{
src.new_line() << type_name<T>() << "2 " << name << " = {0,0};";
}
static void append(backend::source_generator &src,
const std::string &sum, const std::string &val)
{
src.new_line() << sum << " += " << val << ";";
}
static void append_product(backend::source_generator &src,
const std::string &sum, const std::string &mat_val, const std::string &vec_val)
{
src.new_line() << sum << ".x += "
<< mat_val << ".x * " << vec_val << ".x - "
<< mat_val << ".y * " << vec_val << ".y;";
src.new_line() << sum << ".y += "
<< mat_val << ".x * " << vec_val << ".y + "
<< mat_val << ".y * " << vec_val << ".x;";
}
};
} // namespace sparse
} // namespace vex
int main() {
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
std::cout << ctx << std::endl;
// 4x4 diagonal matrix in CSR format:
std::vector<int> ptr = {0,1,2,3,4};
std::vector<int> col = {0,1,2,3};
std::vector<std::complex<double>> val = {
{1.0, 1.0}, {2.0, 2.0}, {3.0, 3.0}, {4.0, 4.0}};
// complex vector:
std::vector<std::complex<double>> x = {
{1.0, 1.0}, {1.0, 1.0}, {1.0, 1.0}, {1.0, 1.0}};
// Device-side matrix and vectors:
vex::sparse::matrix<std::complex<double>> A(ctx, 4, 4, ptr, col, val);
vex::vector<std::complex<double>> X(ctx, x);
vex::vector<std::complex<double>> Y(ctx, 4);
Y = A * X;
std::cout << Y << std::endl;
}
Boost compute supports complex, by the way.
Boost compute supports complex, by the way.
Well, the support is to the same extent vexcl allows to use double2
as complex values. std::complex<double>
is converted to double2
on device, but multiplication and division operations are wrong. For example, the following snippet gives 0:1
while the correct answer is -1:0
:
#include <iostream>
#include <complex>
#include <boost/compute.hpp>
namespace compute = boost::compute;
int main() {
compute::command_queue bcq = compute::system::default_queue();
using compute::lambda::_1;
const int n = 16;
std::complex<double> i{0,1};
compute::vector<std::complex<double>> x(n, bcq.get_context());
compute::fill(x.begin(), x.end(), i);
compute::transform(x.begin(), x.end(), x.begin(), _1 * _1);
std::complex<double> v = x[0];
std::cout << real(v) << ":" << imag(v) << std::endl;
}
Okay, that is not intuitive or ideal, I agree. How about just adding the example you had to the examples? VexCL is a bit short on examples.
How about just adding the example you had to the examples?
I'll try to find time to do this soon. Or, if you are up to it, a PR would be welcome :).