Energy growth in all OCCA modes

Question

Energy growth in all OCCA modes

noelchalmers opened this issue 6 years ago · 5 comments

Hi guys,

I'm trying out Laghos with the new OCCA 1.0 on both a Titan V with CUDA and a Radeon Frontier with HIP. After sorting out some issues (fyi OCCA_PI is not defined in kernels/gpu/quadratureData.okl and kernels/cpu/quadratureData.okl) I managed to get it running. However, I'm finding for all test cases the energy |e| begins to increase dramatically after around 100 time-steps. I've observed this for Serial, OpenMP, CUDA, and HIP modes on both a Titan V and a Radeon Frontier. Each of these runs also gives slightly different outputs on which time-steps get repeated, what dt is, and what the energy is.

Any help tracking down the cause would be great!

Answer 1 · 2018-07-24T22:29:59.000Z

Hi Noel,
Does the same command line run well with the master branch?

Answer 2 · 2018-07-25T00:05:45.000Z

I haven't ported over Laghos+OCCA to OCCA 1.0
One of the example problems did blow up after some time, it hasn't been resolved yet

Answer 3 · 2018-07-25T01:35:01.000Z

@vladotomov The commands I've used are the usual test cases

./laghos -p 1 -m data/square01_quad.mesh -rs 3 -tf 0.8 -no-vis 
./laghos -p 1 -m data/cube01_hex.mesh -rs 2 -tf 0.6 -no-vis

I would believe they run in the master branch, but I can double check. @dmed256 is this the case that blows up?

The fact that the OCCA1.0 version compiles the kernels and runs is promising. Did any MFEM/Laghos kernel code change significantly between OCCA 0.2 and 1.0?

Answer 4 · 2018-11-01T14:02:17.000Z

I think that we've had this issue since the first version of the OCCA kernels.
Depending on the problem, there is either a slight divergence or such an increase.
You can use the CFL to try and control it, but we'll need to correct this.
Thanks Noel for the feedback.

Answer 5 · 2019-10-14T22:48:29.000Z

I'm closing this issue as all sub-directories have this issue.
The MFEM-4.0 version of Laghos that will be merged soon won't have this kind of behaviour by using directly BLAS functions from MFEM to compute the eigen values and compression directions directly from the devices. This should allow the subdirectories to sync with master and not 'diverge'.