Integrating GPU acceleration support in OpenQuantumTools

Just as a first-case test, we implemented a GPU solver as a separate function,

OpenQuantumTools.jl/src/QSolver/closed_system_solvers.jl

Lines 30 to 44 in f0e34ec

    
           function solve_schrodinger_gpu(A::Annealing, tf::Real; tspan = (0, tf), kwargs...) 
        
               u0 = cu(build_u0(A.u0, :v)) 
        
               p = ODEParams(A.H, float(tf), A.annealing_parameter) 
        
               update_func = function (C, u, p, t) 
        
                   update_cache!(C, p.L, p, p(t)) 
        
               end 
        
               cache = cu(get_cache(A.H)) 
        
               diff_op = DiffEqArrayOperator(cache, update_func = update_func) 
        
               jac_cache = cu(similar(cache)) 
        
               jac_op = DiffEqArrayOperator(jac_cache, update_func = update_func) 
        
               ff = ODEFunction(diff_op, jac_prototype = jac_op) 
        
               prob = ODEProblem{true}(ff, u0, Float32.(tspan), p) 
        
               solve(prob; alg_hints = [:nonstiff], kwargs...) 
        
           end

Ideally, we would integrate better. How we do this is effectively solved by solving the issue raised in OpenQuantumBase.jl: USCqserver/OpenQuantumBase.jl#40 (comment)

I raised here simply because we'd need to make changes here as well after resolving the issue in Base.

The following commit in gpu-accel branch is my proposed solution: 24ed5d4.
As described, it works by multiple dispatch and assumes we created a CuAnnealing object in OpenQuantumBase rather than making an additional flag=True/False for GPU usage with each solver.

If you find this a satisfactory solution @neversakura , I will close the issue, and future updates to gpu-accel for other solvers will follow the same paradigm.

If you are strong for the flag, I'd like to hear your thoughts.

Thanks. I like your solution. A possible simpler approach is to keep the current Annealing object and use its type parameter hType as the dispatch argument. I don't see any differences between those two methods for what we are trying to do now.

However, introducing a new CuAnnealing could make room for future GPU-specific optimizations as it is completely decoupled to the Annealing object. So @naezzell I have no objection to your proposal and feel free to close this issue.

By the way, I think we can still use the same constructor of CuAnnealing and Annealing so the user only needs to define the CuHamiltonian type.

	function solve_schrodinger_gpu(A::Annealing, tf::Real; tspan = (0, tf), kwargs...)
	u0 = cu(build_u0(A.u0, :v))
	p = ODEParams(A.H, float(tf), A.annealing_parameter)
	update_func = function (C, u, p, t)
	update_cache!(C, p.L, p, p(t))
	end
	cache = cu(get_cache(A.H))
	diff_op = DiffEqArrayOperator(cache, update_func = update_func)
	jac_cache = cu(similar(cache))
	jac_op = DiffEqArrayOperator(jac_cache, update_func = update_func)
	ff = ODEFunction(diff_op, jac_prototype = jac_op)

	prob = ODEProblem{true}(ff, u0, Float32.(tspan), p)
	solve(prob; alg_hints = [:nonstiff], kwargs...)
	end