fppimenta/rheoTool

Parallel cases produces nan values

nk583 opened this issue · 6 comments

nk583 commented

Hello,

I have compiled and used RheoTool v3 and v5 on my workstation with OpenFOAM v6 and serial and parallel runs it works well.

I have asked our High-Performance Computing team at my institution to install the rheoTool v5 for use with OpenFOAM v7 on our HPC system. The compilation appears to run without any errors being generated. However, we have run into an issue when running cases in parallel. A serial case will run fine and the log file will show the residuals values and timesteps while it is being solved, and fully solved fields (U, p, tau, theta etc.). However, when running in parallel it appears that simulation will not solve and the residuals produce nan values and all the fields also produce nan values. This has been for the tutorial cases for rheoFoam (Channel/Oldroyd-BLog). Log files for the serial case and the parallel case are attached.

Do you know of any issues regarding rheoTool v5 and OpenFOAM v7 with running in parallel?

Thanks.

logserial.log
logparallel.log

As you can see in the parallel log, the pressure eq is facing difficulties to be solved and that can be the problem, although I don't know what may be causing these difficulties. I never had such error and just tried the same tutorial in my laptop with 64 processors (only have 4, but the decomposition is done all the way; used scotch to decompose) and it gave no error. Some suggestions:

  • Try another matrix solver for pressure;
  • Try the same case with a Newtonian fluid;
  • Try the same case in parallel, but say with 2 or 4 processors; also try a different decomposition method (scotch is a good starting point);
  • Try another tutorial in parallel (if you keep using 64 processors, I suggest some tutorial with more than 1M cells);
  • Try the same case in another computer, using the same software versions (this must work, as I am not aware of any specific issue in combining rheotool 5 with openfoam 7).
nk583 commented

Thanks for getting back to me. I should have mentioned previously that I was using the scotch decomposition method. I have tried your suggestions.

(1) First tried changing the pressure solver from PCG to PBiCGStab and appears to fail
(2) Tried using a Newtonian fluid, this for some reason stops solving (not producing nan values) but I see the maximum no. of iterations for the pressure equation is met.
(3) I tried using 2cpus, 4cpus and 16cpus, these all seemed to work in parallel, when 32cpus and 64cpus were run it produces nan values.
(4) I tried 64cpus, but used the simple decomposition method, this failed.
(5) I increased the mesh to approximately 1M and it still failed.
(6) I have tried it with my own case (a backward facing step approx 7M cells) and this fails on 64cpus.

It will take a bit more time to see if this is an issue with OF7 and RheoTool v5 as I do not have OF7 installed on my workstation, only OF6.

Let me suggest one more thing: please try all the 6 cases that I am attaching. Directly use the Allrun, which will deploy 64 procs. It is the same tutorial, but for a Newtonian fluid, and the only difference among cases is the solver tol for pressure. There are cases for both rheoFoam and pimpleFoam (I suspect it is not a particular issue of rheoTool, but I am not sure). In my laptop they all run fine.

Ndeb.zip

nk583 commented

I have run all the simulations you attached. It appears none of them fully solve for both rheoFoam and pimpleFoam and the different tolerances. pimpleFoam will solve and just hang on a timestep, whereas most of the rheoFoam ones seem to crash. I have attached all logs.

Do you think OF7 needs rebuilding? It is strange because I have used the OF7 on the HPC without any issues, however when the HPC team built rheoTool they did a new build of OF7, therefore there might be an issue. I think I will get them to build OF6 and then build rheoTool for OF6.
Ndeb_logs.zip

It seems something is in fact wrong with your openfoam install, as pimpleFoam also is having problems. The test case is not that good because the mesh is small for the number of processors you are testing, but yet it should run, as it runs without troubles in serial, i.e. it is not a time-step, Co, fvSchemes, etc. problem.

It is difficult to debug at distance without being able to reproduce the error and without access to the machine which is giving the error. But as I said, I am not aware of any problem in the combination of rheotool v5 and openfoam v7, either in serial or parallel. Therefore, I would say that this combination should work upon fresh installs. Perhaps the IT team did some mistake in something related with MPI?

Re-open if needed.