Struct of Arrays and BLAS2->3 optimization
baggepinnen opened this issue · 1 comments
baggepinnen commented
Many operations can be sped up by considering batch evaluation of f/g/dg
, i.e.,
A*x[i] + B*u[i] -> A*X + B*U
See LTI implementation here
Maybe an option that calls dynamics with the full particle vector would make this possible.
Vector{SVector}
has the same layout asX/U
above, justreinterpret
.- Figure out how to handle the combined resampling
x[i] = f(xp[j[i]], u, t, noise)
. Maybe by customcopyto!
.
Such reinterpreted arrays can be modified inplace
julia> a = [SVector(1.0, 2.0) for _ in 1:3]
3-element Array{SArray{Tuple{2},Float64,1,2},1}:
[1.0, 2.0]
[1.0, 2.0]
[1.0, 2.0]
julia> b = reinterpret(Float64, a)
6-element reinterpret(Float64, ::Array{SArray{Tuple{2},Float64,1,2},1}):
1.0
2.0
1.0
2.0
1.0
2.0
julia> b .+= 3
6-element reinterpret(Float64, ::Array{SArray{Tuple{2},Float64,1,2},1}):
4.0
5.0
4.0
5.0
4.0
5.0
baggepinnen commented
KernelAbstractions.jl can automatically take a function
https://github.com/JuliaComputing/JuliaSimCompiler.jl/issues/128#issuecomment-1952076080
One downside is that KernelAbstractions is still somewhat experimental, and error messages can be hard to understand and debug.