baggepinnen/LowLevelParticleFilters.jl

Struct of Arrays and BLAS2->3 optimization

baggepinnen opened this issue · 1 comments

Many operations can be sped up by considering batch evaluation of f/g/dg, i.e.,

A*x[i] + B*u[i] -> A*X + B*U

See LTI implementation here

Maybe an option that calls dynamics with the full particle vector would make this possible.

  • Vector{SVector} has the same layout as X/U above, just reinterpret.
  • Figure out how to handle the combined resampling x[i] = f(xp[j[i]], u, t, noise). Maybe by custom copyto!.

Such reinterpreted arrays can be modified inplace

julia> a = [SVector(1.0, 2.0) for _ in 1:3]
3-element Array{SArray{Tuple{2},Float64,1,2},1}:
 [1.0, 2.0]
 [1.0, 2.0]
 [1.0, 2.0]

julia> b = reinterpret(Float64, a)
6-element reinterpret(Float64, ::Array{SArray{Tuple{2},Float64,1,2},1}):
 1.0
 2.0
 1.0
 2.0
 1.0
 2.0

julia> b .+= 3
6-element reinterpret(Float64, ::Array{SArray{Tuple{2},Float64,1,2},1}):
 4.0
 5.0
 4.0
 5.0
 4.0
 5.0

KernelAbstractions.jl can automatically take a function $f$ written to operate on a single particle and rewrite it to operate on all particles at the same time, on CPU or GPU. This can yield a pretty nice speedup. Some details and examples
https://github.com/JuliaComputing/JuliaSimCompiler.jl/issues/128#issuecomment-1952076080

One downside is that KernelAbstractions is still somewhat experimental, and error messages can be hard to understand and debug.