JuliaDynamics/DynamicalSystems.jl

Performance consideration of `EnsembleProblem`

SebastianM-C opened this issue · 6 comments

From slack:

George Datseris:juliapool: Nov 6th at 7:00 PM
So we have a method that integrates many many initial conditions and maps the trajectories into a low-dimensional vector of ~10 numbers. I'm thinking what is the fastest possible way to parallelize this. My current idea: initialize an integrator for each thread. Use the integrator to make a trajectory, map this to the low-dim vector (this is the only thing we store). Then move to the next available initial condition. Each thread does this. Is it good? Is there something much better? (edited)

Sebastian Micluța-Câmpeanu 1 day ago
I think that a more general approach would be using the EnsembleProblem interface, which would also give GPU parallelization, but I don't think it's compatible with using the integrator interface. I remember this was one of my arguments for using ODEProblem s in DynamicalSystems 😄

George Datseris:juliapool: 1 day ago
? huh? you can always make ODEProblem(ds::DynamicalSystem, tspan::Tuple) so this "argument" doesn't even relate here.

George Datseris:juliapool: 1 day ago
I'll have a look at EnsembleProblem.

Sebastian Micluța-Câmpeanu 1 hour ago
I think my knowledge might be outdated, I was under the impression that functionality like lyapunov spectrum and Poincare sections use the integrator interface directly

George Datseris:juliapool: 1 hour ago
Yes that is correct. You can still parallelize in threads just the same way though? Just @threads; lyapunovs(ds); end .

George Datseris:juliapool: 44 minutes ago
Using the integrator directly is much faster than using callbacks by the way, I have tested that in the beginning of DynamicalSystems.jl, more than 3 years ago

Sebastian Micluța-Câmpeanu 29 minutes ago
I see... I remember that we talked about this some years ago, but it's quite fuzzy. I recall that I worked on making a Lyapunov implementation with callbacks, but I don't remember how it compared with the rest. It might be worthwhile to try and do it again sometime again since everything in julia evolved / got faster. I'll let you know if I get around doing that 😄

Sebastian Micluța-Câmpeanu 27 minutes ago
But going back to the main subject, using EnsembleProblems would be a nice way of getting a lot of parallelization stuff for free (like distributed + GPU)

George Datseris:juliapool: 27 minutes ago
yes for sure, many decisions in DynamicalSystems.jl can be challenged now, it was so long ago when the project started... The worst part, everything is so fast that there is no reason (for me) to try and make it faster.

George Datseris:juliapool: 26 minutes ago
Do note that if you want an EnsembleProblem you would need to create a "secondary" lyapunov function anyway: the current functions are meant to be used with a single initial condition at a single parameter combinations. There is nothing to parallelize at this point. (edited)

Sebastian Micluța-Câmpeanu 23 minutes ago
Yeah, I understand that, it's a lot of work and the benefits might not be worth it.
In the same vein, I'm curious if the symbolic stuff from MTK might be of interest

Sebastian Micluța-Câmpeanu 22 minutes ago
that secondary function would be the problem function of the EnsembleProblem, right?

George Datseris:juliapool: 20 minutes ago

In the same vein, I'm curious if the symbolic stuff from MTK might be of interest

Yes, definitely, I have been looking at that to obtain better Jacobians that don't use ForwardDiff. I've done it successfully. At the moment I don't see a way to provide a generic intertface that requires the user doing less. What I did was write my ODEs in MTK and get the jacobian and then just make a DynamicalSystem with the "Julianized" f and J. But I don't see a way to automate this further. (edited)

George Datseris:juliapool: 18 minutes ago

that secondary function would be the problem function of the EnsembleProblem, right?

Well, maybe one can make something more generic. Provide a way to make an ODEProblem whose solution corresponds to the Lyapunov exponents. Then it would be easy to create an EnsembleProblem that has many "lyapunov problems" for different parameters or different initial conditions. The obvious performance problem: you would need not only a ContinuousCallback to rescale the deviation vectors, but also a SavingCallback to actually compute the Lyapunov exponents. If you're willing to put in the work and see how much better this performs that the good-old standard @threads loop over lyapunovspectrum (which can be efficiently done by using the low level interface of lyapunovspectrum that uses an integrator), then please do go ahead and report the findings!

Help wanted here: if anyone can test whether EnsembleProblem would be worth it for something like lyapunovspectrum, which would need both a SavingCallback and a ContinuousCallback.

(p.s.: I'm very skeptical how efficient all of these things are on GPUs)

Actually, now I realize that this exact comparison will be made in this PR: JuliaDynamics/ChaosTools.jl#219

However, we need to be a bit careful. In the comparison of that PR one doesn't need neither ContinuousCallback nor SavingCallback so it is skewed to favor EnsembleProblem. My reason to originally use integrator in stuff like poincaresos and lyapunov was to (1) avoid the callbacks and (2) have as clear source code as possible.

I think that for callbacks, DiscreteCallback could also be considered, since it should be conceptually similar with what the integrator stuff does.

I don't think so. For Poincaresos the condition is a continuous value crosses the hyperplane, and for lyapunovs is a continuous value falling below a threshold. Seems like ContinuousCallback is appropriate for both.

I don't think so. For Poincaresos the condition is a continuous value crosses the hyperplane, and for lyapunovs is a continuous value falling below a threshold. Seems like ContinuousCallback is appropriate for both.

ContinuousCallback will pull back t to the crossing point, while DiscreteCalback could in theory handle multiple crossings in a single time step. You'd have to write a bit more code, but in theory when done it would be more efficient.