No Instance of Constructor for cusparse_ilu0 under Single Precision (float32)

Question

No Instance of Constructor for cusparse_ilu0 under Single Precision (float32)

AXIHIXA opened this issue a year ago · 5 comments

Hi ddemidov,
Thanks for maintaining this wonderful library and hope you are doing well. Your example solver.cu on the CUDA backend works perfectly under double precision, but I'm having compilation errors when trying to switch to single precision. (I'm using CUDA toolkit version 11.8.) An example could be something like the following:

#include <amgcl/amg.hpp>
#include <amgcl/adapter/crs_tuple.hpp>
#include <amgcl/backend/cuda.hpp>
#include <amgcl/coarsening/runtime.hpp>
#include <amgcl/coarsening/smoothed_aggregation.hpp>
#include <amgcl/make_solver.hpp>
#include <amgcl/preconditioner/runtime.hpp>
#include <amgcl/relaxation/cusparse_ilu0.hpp>
#include <amgcl/relaxation/runtime.hpp>
#include <amgcl/solver/cg.hpp>
#include <amgcl/solver/runtime.hpp>

#include <cuda_runtime.h>

namespace amgcu 
{

// SINGLE-PRECISION. 
// NO problem if using double here. 
using Float = float;
using Backend = amgcl::backend::cuda<Float>;  

using Solver = amgcl::make_solver<
        amgcl::amg<
                Backend,
                amgcl::coarsening::smoothed_aggregation,
                amgcl::relaxation::ilu0
        >,
        amgcl::solver::cg<Backend>
>;

//// This one (exactly the same as the solver.cu example) does not work either 
//using Solver = amgcl::make_solver<
//        amgcl::runtime::preconditioner<Backend>,
//        amgcl::runtime::solver::wrapper<Backend>
//>;

void solve()
{
    std::vector<int> col, ptr;
    std::vector<Float> val, rhs, x;
    int n2 = sampleProblem(...);
    Backend::params bprm {};
    cusparseCreate(&bprm.cusparse_handle);
    Solver solve(std::tie(n2, ptr, col, val), {}, bprm);  // Compliation error at this line
}

}  // namespace amgcu

And attached is the error message:

/usr/include/c++/11/ext/new_allocator.h(162): error: no instance of constructor "amgcl::relaxation::ilu0<amgcl::backend::cuda<real, amgcl::solver::cuda_skyline_lu<real>>>::ilu0 [with real=amgcu::Float]" matches the argument list
            argument types are: (amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t>, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params)
          detected during:
            instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(_Up *, _Args &&...) [with _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Up=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/usr/include/c++/11/bits/alloc_traits.h(516): here
            instantiation of "void std::allocator_traits<std::allocator<_Tp>>::construct(std::allocator_traits<std::allocator<_Tp>>::allocator_type &, _Up *, _Args &&...) [with _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Up=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/usr/include/c++/11/bits/shared_ptr_base.h(520): here
            instantiation of "std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_Sp_counted_ptr_inplace(_Alloc, _Args &&...) [with _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Alloc=std::allocator<amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>>, _Lp=__gnu_cxx::_S_atomic, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/usr/include/c++/11/bits/shared_ptr_base.h(651): here
            instantiation of "std::__shared_count<_Lp>::__shared_count(_Tp *&, std::_Sp_alloc_shared_tag<_Alloc>, _Args &&...) [with _Lp=__gnu_cxx::_S_atomic, _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Alloc=std::allocator<amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>>, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/usr/include/c++/11/bits/shared_ptr_base.h(1343): here
            instantiation of "std::__shared_ptr<_Tp, _Lp>::__shared_ptr(std::_Sp_alloc_shared_tag<_Alloc>, _Args &&...) [with _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Lp=__gnu_cxx::_S_atomic, _Alloc=std::allocator<amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>>, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/usr/include/c++/11/bits/shared_ptr.h(410): here
            [ 2 instantiation contexts not shown ]
            instantiation of "std::shared_ptr<_Tp> std::make_shared<_Tp,_Args...>(_Args &&...) [with _Tp=amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>, _Args=<amgcl::backend::crs<amgcu::Float, ptrdiff_t, ptrdiff_t> &, amgcl::relaxation::ilu0<amgcu::<unnamed>::Backend>::params &, const amgcl::backend::cuda<amgcu::Float, amgcl::solver::cuda_skyline_lu<amgcu::Float>>::params &>]" 
/home/user/lib/amgcl/amgcl/amg.hpp(363): here
            instantiation of "amgcl::amg<Backend, Coarsening, Relax>::level::level(std::shared_ptr<amgcl::amg<Backend, Coarsening, Relax>::build_matrix>, amgcl::amg<Backend, Coarsening, Relax>::params &, const amgcl::amg<Backend, Coarsening, Relax>::backend_params &) [with Backend=amgcu::<unnamed>::Backend, Coarsening=amgcl::coarsening::smoothed_aggregation, Relax=amgcl::relaxation::ilu0]" 
/home/user/lib/amgcl/amgcl/amg.hpp(482): here
            instantiation of "void amgcl::amg<Backend, Coarsening, Relax>::do_init(std::shared_ptr<amgcl::amg<Backend, Coarsening, Relax>::build_matrix>, const amgcl::amg<Backend, Coarsening, Relax>::backend_params &) [with Backend=amgcu::<unnamed>::Backend, Coarsening=amgcl::coarsening::smoothed_aggregation, Relax=amgcl::relaxation::ilu0]" 
/home/user/lib/amgcl/amgcl/amg.hpp(204): here
            instantiation of "amgcl::amg<Backend, Coarsening, Relax>::amg(const Matrix &, const amgcl::amg<Backend, Coarsening, Relax>::params &, const amgcl::amg<Backend, Coarsening, Relax>::backend_params &) [with Backend=amgcu::<unnamed>::Backend, Coarsening=amgcl::coarsening::smoothed_aggregation, Relax=amgcl::relaxation::ilu0, Matrix=std::tuple<int &, std::vector<int, std::allocator<int>> &, std::vector<int, std::allocator<int>> &, std::vector<amgcu::Float, std::allocator<amgcu::Float>> &>]" 
/home/user/lib/amgcl/amgcl/make_solver.hpp(100): here
            instantiation of "amgcl::make_solver<Precond, IterativeSolver>::make_solver(const Matrix &, const amgcl::make_solver<Precond, IterativeSolver>::params &, const amgcl::make_solver<Precond, IterativeSolver>::backend_params &) [with Precond=amgcl::amg<amgcu::<unnamed>::Backend, amgcl::coarsening::smoothed_aggregation, amgcl::relaxation::ilu0>, IterativeSolver=amgcl::solver::cg<amgcu::<unnamed>::Backend, amgcl::solver::detail::default_inner_product>, Matrix=std::tuple<int &, std::vector<int, std::allocator<int>> &, std::vector<int, std::allocator<int>> &, std::vector<amgcu::Float, std::allocator<amgcu::Float>> &>]" 
/home/user/workspace/BenchmarkAMGCL/src/solver.cu(92): here

P.S. I'm wondering what is your recommended configuration to benchmark AMGCL on GPU and under float32 precision. I suppose that we must stick to the cusparse_ilu0 coarsener to fully utilize CUDA (i.e., switching to other coarseners will work (tested), but with sub-optimal performance.)

Cheers.

Answer 1 · 2023-09-02T16:27:21.000Z

I can reproduce this with a simpler example:

#include <amgcl/backend/cuda.hpp>
#include <amgcl/relaxation/ilu0.hpp>
#include <amgcl/relaxation/cusparse_ilu0.hpp>

int main() {
	#if 1
	typedef double real; // works
	#else
	typedef float real; // does not work
	#endif

	typedef amgcl::backend::cuda<real> Backend;
	typedef amgcl::relaxation::ilu0<Backend> Relax;
	Backend::params bprm;
	Relax::params prm;
	amgcl::backend::crs<real> A;
	amgcl::relaxation::ilu0<Backend> relax(A, prm, bprm);
}

So far I am not sure what is the reason, the compiler messages are unusually terse.

Answer 2 · 2023-09-02T16:34:06.000Z

Can you try with 8083b23? This is the commit right before 5836b77, which I suspect could be the reason.

Are you sure that ilu0 gives you the best possible performance for your problem? Often simpler relaxations like spai0 perform better simply because they are more easy to parallelize. Also, you could try the approximated ilu0 version by omitting the <amgcl/relaxation/cusparse_ilu0.hpp> include.

Answer 3 · 2023-09-02T16:39:17.000Z

Thanks for the swift response. I'll try 8083b23 later when I'm home.

I would like to benchmark AMGCL on GPU, and I suppose that cusparse_ilu0 is the only coarsener tuned for CUDA. I suppose other coarseners like spai0 will be CPU-only (which might nevertheless perform better, I'm not sure)? P.S. What is your recommendation?

Answer 4 · 2023-09-02T16:43:06.000Z

All relaxations that are supported by the CUDA backend do use the GPU. cusparse_ilu0 is brought in from CUSPARSE simply because ILU is hard to implement efficiently, and it so happens that nvidia provides their own implementation.

Answer 5 · 2023-09-03T09:02:23.000Z

Should be fixed in 5d143b1, thanks for reporting!