[New Feature] Support for low-storage RK4 method
doraemonho opened this issue · 4 comments
Hi,
I modify a bit of FourierFlows for adding a new time-stepper to support low-storage RK4 scheme. (The code is in below) This is a classical 5 stages 2 registers process to trade off 25~30% performance but saving 50% of storge space for RK4 method. [1]. I wonder could we make FourierFlows support this feature in the next update.
# ------
# LSRK(5)4
# ------
"""
struct LSRK54TimeStepper{T} <: AbstractTimeStepper{T}
A 5 stage 2 storage 4th-order Runge-Kutta timestepper for time-stepping. `∂u/∂t = F(u, t)` via:
Define S² = 0 at the first stage.
for i = 1:5
S² = Aᵢ*S² + dt*Fⁱ(t₀+ Cᵢdt, uⁿ);
uⁿ⁺¹ = uⁿ + Bᵢ*S²
end
where
# dt -> time interal
# Aᵢ -> A coefficients from LSRK tableau table in stage i
# Bᵢ -> B coefficients from LSRK tableau table in stage i
# Cᵢ -> C coefficients from LSRK tableau table in stage i
# i -> stage i
for detials, please refer to M.H. Carpenter, C.A. Kennedy, Fourth-order 2N-storage Runge–Kutta schemes, Technical Report NASA TM-109112, NASA Langley Research Center, VA, June 1994
"""
struct LSRK54TimeStepper{T,Vec} <: AbstractTimeStepper{T}
S² :: T
Fⁱ :: T
A :: Vec
B :: Vec
C :: Vec
end
"""
LSRK54TimeStepper(equation::Equation, dev::Device=CPU())
Construct a 4th-order 5 stages low storage Runge-Kutta timestepper for `equation` on device `dev`.
"""
function LSRK54TimeStepper(equation::Equation, dev::Device=CPU())
@devzeros typeof(dev) equation.T equation.dims S² Fⁱ
T = equation.T;
A = T[0.0, -0.417890474499852, -1.19215169464268, -1.69778469247153, -1.51418344425716];
B = T[0.149659021999229, 0.379210312999627, 0.822955029386982, 0.699450455949122, 0.153057247968152];
C = T[ 0.0, 0.149659021999229, 0.3704009573642045, 0.6222557631344415, 0.95828213067469];
return LSRK54TimeStepper(S², Fⁱ, A, B, C);
end
function LSRK54!(sol, clock, ts, equation, vars, params, grid, t, dt)
# Get the A,B,C ceof. for LSRK54 method
T = equation.T;
A = ts.A::Array{T,1};
B = ts.B::Array{T,1};
C = ts.C::Array{T,1};
# Init. the S² term
@. ts.S² *= 0;
for i = 1:5
equation.calcN!(ts.Fⁱ, sol, t + C[i]*dt , clock, vars, params, grid)
addlinearterm!(ts.Fⁱ, equation.L, sol);
@. ts.S² = A[i]*ts.S² + dt*ts.Fⁱ;
@. sol = sol + B[i]*ts.S²;
end
return nothing;
end
function stepforward!(sol, clock, ts::LSRK54TimeStepper, equation, vars, params, grid)
LSRK54!(sol, clock, ts, equation, vars, params, grid, clock.t, clock.dt)
#Finishing the time step
clock.t += clock.dt
clock.step += 1;
return nothing;
end
Feel free to open a PR.
Do you have/find cases where memory is the bottleneck compared to efficiency?
Feel free to open a PR.
Do you have/find cases where memory is the bottleneck compared to efficiency?
Definitely 3D simulation on GPU. With a high-end gaming/data center GPU, the speed is fine but we already approaching to the resolution limit of using a single GPU. Reducing 50% memory can further make the Cubes 1.25 times larger in length.
Cool! Go for it. Open a PR and I can help if needed.
Cool! Go for it. Open a PR and I can help if needed.
Thanks!I submitted the PR. I will need your help to review it and make the merge if the code is fine.